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Real Party In Interest 

This Application is owned by OpinionLab, Inc., as indicated by an Assignment 
recorded on July 29, 2003 in the Assignment Records of the PTO at Reel 014359, Frame 
0548 (6 pages). 

Related Appeals and Interferences 

No known appeals, interferences, or judicial proceedings are related to or will directly 
affect, be directly affected by, or have a bearing on the Board's decision regarding this 
Appeal. 

Status of Claims 

Claims 1-34 are pending in this Application, stand rejected pursuant to the Final 
Office Action mailed November 14, 2006, and are presented for appeal. All pending claims 
are shown in Appendix A, along with an indication of the status of those claims. 

Status of Amendments 

All amendments submitted by Appellants were entered by the Examiner prior to the 
mailing of the Final Office Action. 

Summary of Claimed Subject Matter 

The claimed invention relates to systems and methods for providing substantially real- 
time access to collected information concerning user interaction with a web page of a 
website. (Page 1, Lines 7-8). Certain embodiments of the invention may enable a website 
owner to, while viewing a particular web page, perform a substantially real-time look-up of 
user feedback information concerning the particular web page that has been collected from 
users who have accessed the particular web page. (Page 5, Lines 23-26). Certain 
embodiments may be useful to a content manager who is responsible for particular web pages 
of a website and wants to analyze user feedback information concerning only those particular 
web pages. (Page 5, Lines 27-29). Certain embodiments may also be useful to others who 
may want to review unfiltered user feedback information directly from any web page of a 
company's website. (Page 5, Line 30 - Page 6, Line 2). 
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Figure 1 illustrates an example system for measuring and reporting user feedback to 
particular web pages. 
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According to certain embodiments, system 10 may include web server 14 and reporting 
server 18. (Page 9, Lines 27-28). Web server 14 hosts or otherwise supports at least one 
website 26 including one or more web pages 28. (Column 10, Lines 11-12). According to 
certain embodiments, a user 16 may establish a connection to server 14 and access a 
particular web page 28. (Page 10, Lines 17-20). Each user 16 may have an opinion, 
assessment, feeling, or other subjective reaction to each web page 28 communicated to the 
user 16, either in its entirety or more specifically to the format, content, design, or another 
characteristic associated with web page 28. (Page 10, Line 29 - Page 11, Line 1). In certain 
embodiments, feedback from a user 16 concerning web page 28 may reflect one or more 
reactions of user 16 to web page 28 and may, where appropriate, include ratings, comments, 
answers to explicit questions, or any other suitable general or specific user feedback 
concerning web page 28. (Page 11, Lines 12-17). 



In one embodiment, server 14 supports a user feedback measurement tool 30 that is 
incorporated into web page 28 and may be communicated to user 16 with web page 28. 
(Page 11, Lines 23-25). In a particular embodiment, tool 30 includes software code 
incorporated into the HTML, XML, or other software code of web page 28. Tool 30 may 
also include one or more scripts that may be stored in a dedicated or other suitable directory. 
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(Page 11, Line 29 - Page 12, Line 1). Figures 3-6 illustrate example feedback measurement 
tools according to particular embodiments. 
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In certain embodiments, web page 28 may provide website owner 12 access to 
collected user feedback information concerning a particular web page 28. (Page 50, Lines 
24-25). In particular embodiments, website owner 12 may launch a feedback- viewing 
application for viewing collected feedback information concerning web page 28 by accessing 
web page 28 using a web browser and, while web page 28 is viewable, entering a particular 
keystroke using a keyboard or otherwise indicating a desire to access collected user feedback 
information concerning web page 28. (Page 51, Lines 12-16). The feedback- viewing 
application launched by website owner 12 may be provided by any suitable software 
component associated with web page 28. (Page 51, Lines 27-28). As an example, in 
particular embodiments, feedback measurement tool 30 may include one or more modules 
that provide the feedback- viewing application. (Page 51, Lines 29-30). 
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Figure 13 illustrates an example password window 500 that may be presented to 
owner 12 after the feedback-viewing application has been launched. (Page 52, Lines 3-5). 
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In certain embodiments, the feedback- viewing application may require website owner 12 to 
enter a valid subscriber ID and a valid password before the feedback-viewing application 
provides website owner 12 access to collected user feedback information concerning web 
page 28. (Page 52, Lines 5-8). For example, the feedback- viewing application may 
determine whether the subscriber ID and the password entered by website owner 12 are valid. 
(Page 52, Lines 13-15). According to this example, if the subscriber ID and the password 
entered by website owner 12 are valid, the feedback- viewing application may allow website 
owner to access collected user feedback information concerning web page 28. (Page 52, 
Lines 15-17). Figures 15 and 16 illustrate an example displays that may be generated by 
feedback- viewing application to present collected user feedback to website owner 12. 
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Figure 19 illustrates an example method for providing substantially real-time access 
to collected information concerning user interaction with a web page of a website. (Page 8, 
Lines 3-5). The example method begins at step 600, where website owner 12 accesses web 
page 28. (Page 57, Lines 29-30). At step 602, while web page 28 is still viewable, website 
owner 12 enters a particular keystroke or other suitable input to launch a feedback- viewing 
application associated with web page 28. (Page 57, Line 29 - Page 58, Line 1). At step 604, 
the feedback- viewing application presents a password window 510 to website owner 12. 
(Page 57, Lines 28-29). At step 606, website owner 12 enters a subscriber ED and a password 
and selects login button 506. At step 608, if the subscriber ID and the password entered by 
website owner 12 are invalid, the method returns to step 604. At step 608, if the subscriber 
ID and the password entered by website owner 12 are valid, the example method proceeds to 
step 610. At step 610, the feedback- viewing application presents time frame window 510 to 
website owner 12. At step 612, website owner 12 enters a time frame and selects run-report 
button 514. At step 614, the feedback- viewing application presents report page 516 to 
website owner 12 according to the entered time frame, at which point the example method 
ends. 
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With regard to the independent claims currently under Appeal, Appellants provide the 
following concise explanation of the subject matter recited in the claim elements. For 
brevity, Appellants does not necessarily identify every portion of the Specification and 
drawings relevant to the recited claim elements. Additionally, this explanation should not be 
used to limit Appellants' claims but is intended to assist the Board in considering the Appeal 
of this Application. 
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For example, independent Claim 1 recites: 

A system for providing substantially real-time access to collected 
information concerning user interaction with a particular web page of a 
website, the system comprising: 

first software associated with a particular web page of a website and 
operable to collect information concerning user interaction with the particular 
web page (see e.g., Page 11, Line 23 - Page 13, Line 5; Page 21, Lines 3-6); 
and 

second software associated with the particular web page operable to: 

receive, from a website owner who has accessed the particular web 
page using a web browser while the particular web page is viewable 
within a browser window of the web browser, input indicating a desire 
to access the collected information concerning user interaction with the 
particular web page {see e.g., Page 50, Line 24 - Page 52, Line 2; Page 
57, Line 30 - Page 58, Line 1); 

determine whether the website owner is authorized to access the 
collected information concerning user interaction with the particular web 
page {see e.g., Page 52, Lines 3-20; Page 58, Lines 3-7); and 

if the website owner is authorized to access the collected 
information concerning user interaction with the particular web page: 

generate a viewable user interface providing substantially 
real-time access to the collected information concerning user 
interaction with the particular web page {see e.g., Page 54, Line 14 
- Page 55, Line 6; Page 58, Lines 11-13); and 

to provide the website owner substantially real-time access to 
the collected information concerning user interaction with the 
particular web page, present the viewable user interface to the 
website owner in substantially real-time in response to the input 
received from the website owner while the particular web page was 
viewable within the browser window of the web browser (see e.g., 
Page 8, Lines 3-5; Page 57, Line 28 - Page 58, Line 19). 

As another example, independent Claim 12 recites: 

A method for providing substantially real-time access to collected 
information concerning user interaction with a particular web page of a 
website, the method comprising: 

collecting information concerning user interaction with a particular web 
page of a website (see e.g., Page 11, Line 23 - Page 13, Line 5; Page 21, Lines 
3-6); 
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receiving, from a website owner who has accessed the particular web 
page using a web browser while the particular web page is viewable within a 
browser window of the web browser, input indicating a desire to access the 
collected information concerning user interaction with the particular web page 
{see e.g., Page 50, Line 24 - Page 52, Line 2; Page 57, Line 30 - Page 58, Line 

i); 

determining whether the website owner is authorized to access the 
collected information concerning user interaction with the particular web 
page(>ee e.g., Page 52, Lines 3-20; Page 58, Lines 3-7); and 

if the website owner is authorized to access the collected information 
concerning user interaction with the particular web page: 

generating a viewable user interface providing substantially real- 
time access to the collected information concerning user interaction with 
the particular web page {see e.g., Page 54, Line 14 - Page 55, Line 6; 
Page 58, Lines 11-13); and 

to provide the website owner substantially real-time access to the 
collected information concerning user interaction with the particular web 
page, presenting the viewable user interface to the website owner in 
substantially real-time in response to the input received from the website 
owner while the particular web page was viewable within the browser 
window of the web browser {see e.g., Page 8, Lines 3-5; Page 57, Line 
28 - Page 58, Line 19). 

As another example, independent Claim 23 recites: 

Software for providing substantially real-time access to collected 
information concerning user interaction with a particular web page of a 
website while the particular web page is viewable, the software embodied in 
media and when executed operable to: 

receive, from a website owner who has accessed a particular web page 
of a website and using a web browser while the particular web page is 
viewable within a browser window of the web browser, input indicating a 
desire to access collected information concerning user interaction with the 
particular web page {see e.g., Page 50, Line 24 - Page 52, Line 2; Page 57, 
Line 30 - Page 58, Line 1); 

determine whether the website owner is authorized to access the 
collected information concerning user interaction with the particular web page 
{see e.g., Page 52, Lines 3-20; Page 58, Lines 3-7); and 

if the website owner is authorized to access the collected information 
concerning user interaction with the particular web page: 

generate a viewable user interface providing substantially real-time 
access to the collected information concerning user interaction with the 
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particular web page (see e.g., Page 54, Line 14 - Page 55, Line 6; Page 
58, Lines 11-13); and 

to provide the website owner substantially real-time access to the 
collected information concerning user interaction with the particular web 
page, present the viewable user interface to the website owner in 
substantially real-time in response to the input received from the website 
owner while the particular web page was viewable within the browser 
window of the web browser (see e.g., Page 8, Lines 3-5; Page 57, Line 
28 - Page 58, Line 19). 

As another example, independent Claim 34 recites: 

A system for providing substantially real-time access to collected 
information concerning user interaction with a particular web page of a 
website, the system comprising: 

means for collecting information concerning user interaction with a 
particular web page of a website (see e.g., Page 11, Line 23 - Page 13, Line 5; 
Page 21, Lines 3-6); and 

means for receiving, from a website owner who has accessed the 
particular web page using a web browser and while the particular web page is 
viewable within a browser window of the web browser, input indicating a 
desire to access the collected information concerning user interaction with the 
particular web page (see e.g., Page 50, Line 24 - Page 52, Line 2; Page 57, 
Line 30 - Page 58, Line 1); 

means for determining whether the website owner is authorized to 
access the collected information concerning user interaction with the 
particular web page (see e.g., Page 52, Lines 3-20; Page 58, Lines 3-7); and 

means for, if the website owner is authorized to access the collected 
information concerning user interaction with the particular web page: 

generating a viewable user interface providing substantially real- 
time access to the collected information concerning user interaction with 
the particular web page (see e.g., Page 54, Line 14 - Page 55, Line 6; 
Page 58, Lines 11-13); and 

to provide the website owner substantially real-time access to the 
collected information concerning user interaction with the particular web 
page, presenting the viewable user interface to the website owner in 
substantially real-time in response to the input received from the website 
owner while the particular web page is viewable within the browser 
window of the web browser (see e.g., Page 8, Lines 3-5; Page 57, Line 
28 - Page 58, Line 19). 
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Grounds of Rejection to be Reviewed on Appeal 

Appellants request the Board to review the Examiner's rejection of Claims 1-5, 10-16, 
21-27, and 32-34 under 35 U.S.C. § 102(e). 

Appellants request the Board to review the Examiner's rejection of Claims 6-9, 17-20, 
and 28-31 under 35 U.S.C. § 103(a). 

Argument 

For at least the following reasons, the Examiner's rejections of Claims 1-34 are 
improper and should be reversed. 

I. Claims 1-5, 10-16, 21-27 and 32-34 are Allowable over Muret 

In the Final Office Action, the Examiner rejected Claims 1-5, 10-16, 21-27, and 32-34 
under 35 U.S.C. 102(e) as being anticipated by U.S. Patent No. 6,792,458 Bl to Muret, et al. 
( "Muret"). Appellants respectfully disagree. A copy of Muret is included in Appendix B. 

Independent Claim 1 recites: 

A system for providing substantially real-time access to collected 
information concerning user interaction with a particular web page of a 
website, the system comprising: 

first software associated with a particular web page of a website and 
operable to collect information concerning user interaction with the particular 
web page; and 

second software associated with the particular web page operable 

to: 

receive, from a website owner who has accessed the 
particular web page using a web browser while the particular web 
page is viewable within a browser window of the web browser, 
input indicating a desire to access the collected information 
concerning user interaction with the particular web page; 

determine whether the website owner is authorized to access 
the collected information concerning user interaction with the 
particular web page; and 

if the website owner is authorized to access the collected 
information concerning user interaction with the particular web page: 
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generate a viewable user interface providing 
substantially real-time access to the collected information 
concerning user interaction with the particular web page; and 

to provide the website owner substantially real-time 
access to the collected information concerning user interaction 
with the particular web page, present the viewable user 
interface to the website owner in substantially real-time in 
response to the input received from the website owner 
while the particular web page was viewable within the 
browser window of the web browser. 



Independent Claims 12, 23, and 34 recite substantially similar limitations. Appellants 
respectfully submit that Muret fails to disclose, teach, or suggest the combination of elements 
recited in any of Claims 1,12, 23, and 34. For example, Muret fails to disclose, teach, or 
suggest software "associated with the particular web page" such that, as recited in Claim 1, it 
is operable to: 

(1) "receive, from a website owner who has accessed the particular web page using a 
web browser while the particular web page is viewable within a browser window 
of the web browser, input indicating a desire to access the collected information 
concerning user interaction with the particular web page;" and 

(2) "present the viewable user interface to the website owner in substantially real-time 
in response to the input received from the website owner while the particular web 
page was viewable within the browser window of the web browser." 

The specification describes certain technical advantages associated with particular 
embodiments of a system that includes the features recited in Claim 1. According to the 
specification, "particular embodiments may enable a website owner to, while viewing a 
particular web page, perform a substantially real-time look-up of user feedback information 
concerning the particular web page." (Page 5, Lines 23-25). These embodiments may be 
advantageous to those who may want to review user feedback information "directly from any 
web page of a company's website." (Page 5, Line 30 - Page 6, Line 2). 

The Examiner asserts that the "user 530 send[ing] a report request 540 to the report 
engine 400 via a web browser 520" disclosed in Muret can be properly construed as "second 
software associated with the particular web page operable to: receive, from a website owner 
who has accessed the particular web page using a web browser while the particular web page 
is viewable within a browser window of the web browser, input indicating a desire to access 
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the collected information concerning user interaction with the particular web page," as recited 
in Claim 1. {See Final Office Action, Pages 4-5). However, nothing in Muret discloses that 
report request 540, report engine 400, or web browser 520 are "associated with the particular 
web page" as recited in Claim 1. Accordingly, neither the report request 540, the report 
engine 400, nor the web browser 520 can be properly construed as "second software 
associated with the particular web page" as recited in Claim 1 . 

Furthermore, to the extent the Examiner is asserting that user 530 can be properly 
construed as "a website owner," as recited in Claim 1, Appellants respectfully submit that 
nothing in Muret discloses that user 530 accesses the particular web page, much less that any 
software disclosed in Muret receives input from user 530 (or any other entity) indicating a 
desire to access information concerning the particular web page, "while the particular web 
page is viewable within a browser window of the web browser" Accordingly, the report 
engine 400 receiving a report request 540 sent by user 530, disclosed in Muret, cannot be 
properly construed as "second software associated with the particular web page operable to: 
receive, from a website owner who has accessed the particular web page using a web browser 
while the particular web page is viewable within a browser window of the web browser, 
input indicating a desire to access the collected information concerning user interaction with 
the particular web page," as recited in Claim 1 . 

The Examiner also asserts that system 100 with report engine 400, disclosed in Muret, 
can be properly construed as "presenting] the viewable user interface to the website owner in 
substantially real-time in response to the input received from the website owner while the 
particular web page was viewable within the browser window of the web browser," as recited 
in Claim 1. {See Final Office Action, Pages 5-6). However, as discussed above, nothing in 
Muret discloses that any software receives input from user 530 (or any other entity) 
indicating a desire to access information concerning the particular web page, "while the 
particular web page is viewable within a browser window of the web browser" 
Accordingly, Muret necessarily fails to disclose software associated with the particular web 
page operable to "present the viewable user interface to the website owner in substantially 
real-time in response to the input received from the website owner while the particular web 
page was viewable within the browser window of the web browser" as recited in Claim 1 . 
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Thus, Muret fails to disclose, teach, or suggest each and every limitation recited in 
Claim 1. Claim 1 is allowable for at least this reason and Claims 12, 23, and 34 are allowable 
for at least substantially similar reasons. Dependent Claims 2-5, 10-11, 13-16, 21-22, 24-27 
and 32-33 are allowable at least because they depend from Claims 1,12, and 23. 

For at least these reasons, Appellants respectfully request that the Board reverse the 
Examiner's rejection of Claims 1-5, 10-16, 21-27 and 32-34. 

II. Claims 6-9, 17-20, and 28-31 are Allowable over the Proposed Muret-Kurzrok 
Combination 

In the Final Office Action, the Examiner rejects Claims 6-9, 17-20, and 28-31 under 
35 U.S.C. §103(a) as being unpatentable over Muret in view of U.S. Patent No. 6,260,064 Bl 
to Kurzrok ("Kurzrok"). Appellants respectfully disagree. A copy of Kurzrok is included in 
Appendix B. 

Dependent Claims 6-9, 17-20, and 28-31 depend from independent Claims 1, 12, and 
23 respectively. With respect to the elements of independent Claims 1, 12, and 23, the 
Examiner relies on the disclosure of Muret and cites to portions of Kurzrok as allegedly 
disclosing certain additional elements recited in these dependent claims. However, as shown 
above, Muret fails to disclose, teach, or suggest each and every limitation recited in any of 
independent Claims 1, 12, and 23. Appellants respectfully submit that these inadequacies of 
Muret are not remedied by the proposed combination of Muret with Kurzrok. 

Kurzrok discloses a system for collecting ratings from a reader of certain content on a 
web site. (Column 1, Lines 54-61). The system of Kurzrok compiles these ratings in a 
database. (Column 3, Lines 25-27). When a request for a rating summary is received, a 
rating for the content is calculated and the data is sent to the requester. (Column 4, Lines 23- 
63). Kurzrok does not disclose how the request for a rating summary is generated or what, if 
any, relationship exists between a web site "reader" and the rating summary "requester." 
Accordingly, Kurzrok necessarily does not disclose, teach, or suggest that the web site reader 
requests a rating summary for the content that the web site reader has read or viewed. 
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Kurzrok also necessarily does not disclose that the rating summary requester reads or views 
the content that is the subject of the rating summary. 

The Examiner states, "Kurzrok' s method is for content providers to retrieve rating 
summaries [and] it is common knowledge that a content provider would have viewed their 
content." (Final Office Action, Page 35). However, even if the Examiner's statement were 
correct, which applicants do not concede, whether a content provider had at some previous 
point in time viewed something does not disclose, teach, or suggest software associated with 
a particular web page operable to "receive, from a website owner who has accessed the 
particular web page using a web browser while the particular web page is viewable within a 
browser window of the web browser, input indicating a desire to access the collected 
information concerning user interaction with the particular web page," as recited in Claim 1 . 

Thus, as with Muret, Kurzrok fails to disclose, teach, or suggest at least software 
"associated with the particular web page" such that, as recited in Claim 1, it is operable to: 

(1) "receive, from a website owner who has accessed the particular web page using a 
web browser while the particular web page is viewable within a browser window 
of the web browser, input indicating a desire to access the collected information 
concerning user interaction with the particular web page;" and 

(2) "present the viewable user interface to the website owner in substantially real-time 
in response to the input received from the website owner while the particular web 
page was viewable within the browser window of the web browser." 

Therefore, even if Muret could properly be combined with Kurzrok as the Examiner 
proposes, which Appellants do not concede, this combination would still fail to disclose, 
teach, or suggest each and every limitation recited in independent Claim 1, from which 
dependent Claims 6-9 depend. Dependent Claims 6-9 are allowable for at least these reasons 
and Claims 17-20 and 28-31 are allowable for substantially similar reasons. 

For at least these reasons, Appellants respectfully request that the Board reverse the 
Examiner's rejection of Claims 6-9, 17-20 and 28-31. 



DAL01:949535.1 



ATTORNEY DOCKET NO. 
067543.0184 



17 



PATENT APPLICATION 
10/630,426 



Conclusion 

Appellants have demonstrated that the present invention, as claimed, is clearly 
distinguishable over the prior art cited by the Examiner and that the Examiner's rejection of 
Claims 1-34 is improper. Therefore, Appellants respectfully request the Board to reverse the 
Examiner's rejection and to instruct the Examiner to issue a notice of allowance as to all 
pending claims. 

Appellants have enclosed a check in the amount of $250.00 for this Appeal Brief. 
Appellant believes no additional fees are due. The Commission is hereby authorized to 
charge any additional fee or credit any overpayment to Deposit Account No. 02-0384 of 
Baker Botts L.L.P. 



Respectfully submitted, 

BAKER BOTTS L.L.P. 
Attorneys for Appellants 




Christopher W. Kennerly 
Reg. No. 40,675 



Date: rf^/p ^ 

CORRESPONDENCE ADDRESS : 

Customer No. 05073 
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Appendix A: Claims on Appeal 

1 . (Previously Presented) A system for providing substantially real-time access 
to collected information concerning user interaction with a particular web page of a website, 
the system comprising: 

first software associated with a particular web page of a website and operable to 
collect information concerning user interaction with the particular web page; and 
second software associated with the particular web page operable to: 

receive, from a website owner who has accessed the particular web page using 
a web browser while the particular web page is viewable within a browser window of 
the web browser, input indicating a desire to access the collected information 
concerning user interaction with the particular web page; 

determine whether the website owner is authorized to access the collected 
information concerning user interaction with the particular web page; and 

if the website owner is authorized to access the collected information 
concerning user interaction with the particular web page: 

generate a viewable user interface providing substantially real-time 
access to the collected information concerning user interaction with the 
particular web page; and 

to provide the website owner substantially real-time access to the 
collected information concerning user interaction with the particular web page, 
present the viewable user interface to the website owner in substantially real- 
time in response to the input received from the website owner while the 
particular web page was viewable within the browser window of the web 
browser. 

DALO 1:949535.1 



ATTORNEY DOCKET NO. 
067543.0184 



19 



PATENT APPLICATION 
10/630,426 



2. (Previously Presented) The system of Claim 1, wherein the second software is 
operable to: 

receive a password from the website owner; and 

to determine whether the website owner is authorized to access the collected 
information concerning user interaction with the particular web page, determine whether the 
password received from the website owner is valid. 

3. (Previously Presented) The system of Claim 1, wherein the second software is 
operable to: 

receive, from the website owner, one or more specified filter criteria applicable to the 
collected information concerning user interaction with the particular web page; and 

filter the collected information concerning user interaction with the particular web 
page according to the specified filter criteria such that the website owner is presented only 
particular collected information concerning user interaction with the particular web page 
matching the specified filter criteria. 

4. (Previously Presented) The system of Claim 3, wherein at least one of the 
filter criteria comprises a time frame associated with the collected information concerning 
user interaction with the particular web page. 

5. (Previously Presented) The system of Claim 1, wherein the collected 
information concerning user interaction with the particular web page is user traffic 
information. 
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6. (Previously Presented) The system of Claim 1, wherein the collected 
information concerning user interaction with the particular web page is user feedback 
information concerning the particular web page. 

7. (Previously Presented) The system of Claim 6, wherein the second software is 
operable, if the website owner is authorized to access the collected user feedback 
information, to: 

generate a report of the collected user feedback information; and 

present the report to the website owner to provide the website owner access to the 
collected user feedback information while the particular web page is viewable within the 
browser window of the web browser. 

8. (Original) The system of Claim 7, wherein the report comprises one or more 

of: 

a first display of a time frame associated with the collected user feedback information; 
a second display providing an overview of the collected user feedback information; 

and 

a third display of one or more sliding bars that each correspond to a particular type of 
collected user feedback information and indicate percentages of negative, neutral, and 
positive user feedback information of the corresponding particular type of collected user 
feedback information. 
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9. (Previously Presented) The system of Claim 7, wherein the report comprises a 
display window operable to display one or more of: 

one or more charts of one or more general or specific user ratings of the particular 
web page; 

user comments regarding the particular web page; and 

one or more survey displays of user answers to one or more explicit questions 
regarding the particular web page. 

10. (Previously Presented) The system of Claim 1, wherein the input indicating a 
desire to access the collected information concerning user interaction with the particular web 
page comprises entry of one or more particular keystrokes using a keyboard. 

1 1 . (Previously Presented) The system of Claim 1 , wherein: 
the particular web page comprises a first web page; 

the website comprises one or more other web pages in addition to the first web page; 

and 

the second software is further operable to: 

receive, from the website owner, a specification of one or more of the other 
web pages; and 

provide the website owner access from the first web page to collected 
information concerning user interaction with the specified other web pages in addition 
to the collected information concerning user interaction with the first web page. 
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12. (Previously Presented) A method for providing substantially real-time access 
to collected information concerning user interaction with a particular web page of a website, 
the method comprising: 

collecting information concerning user interaction with a particular web page of a 
website; 

receiving, from a website owner who has accessed the particular web page using a 
web browser while the particular web page is viewable within a browser window of the web 
browser, input indicating a desire to access the collected information concerning user 
interaction with the particular web page; 

determining whether the website owner is authorized to access the collected 
information concerning user interaction with the particular web page; and 

if the website owner is authorized to access the collected information concerning user 
interaction with the particular web page: 

generating a viewable user interface providing substantially real-time access to 

the collected information concerning user interaction with the particular web page; 

and 

to provide the website owner substantially real-time access to the collected 
information concerning user interaction with the particular web page, presenting the 
viewable user interface to the website owner in substantially real-time in response to 
the input received from the website owner while the particular web page was 
viewable within the browser window of the web browser. 
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13. (Previously Presented) The method of Claim 12, comprising: 
receiving a password from the website owner; and 

to determine whether the website owner is authorized to access the collected 
information concerning user interaction with the particular web page, determining whether 
the password received from the website owner is valid. 

14. (Previously Presented) The method of Claim 12, comprising: 

receiving, from the website owner, one or more specified filter criteria applicable to 
the collected information concerning user interaction with the particular web page; and 

filtering the collected information concerning user interaction with the particular web 
page according to the specified filter criteria such that the website owner is presented only 
particular collected information concerning user interaction with the particular web page 
matching the specified filter criteria. 

15. (Previously Presented) The method of Claim 14, wherein at least one of the 
filter criteria comprises a time frame associated with the collected information concerning 
user interaction with the particular web page. 

16. (Previously Presented) The method of Claim 12, wherein the collected 
information concerning user interaction with the particular web page is user traffic 
information. 
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17. (Previously Presented) The method of Claim 12, wherein the collected 
information concerning user interaction with the particular web page is user feedback 
information concerning the particular web page. 

18. (Previously Presented) The method of Claim 17, comprising, if the website 
owner is authorized to access the collected user feedback information: 

generating a report of the collected user feedback information; and 

presenting the report to the website owner to provide the website owner access to the 

collected user feedback information while the particular web page is viewable within the 

browser window of the web browser. 

19. (Original) The method of Claim 1 8, wherein the report comprises one or more 

of: 

a first display of a time frame associated with the collected user feedback information; 
a second display providing an overview of the collected user feedback information; 

and 

a third display of one or more sliding bars that each correspond to a particular type of 
collected user feedback information and indicate percentages of negative, neutral, and 
positive user feedback information of the corresponding particular type of collected user 
feedback information. 
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20. (Previously Presented) The method of Claim 1 8, wherein the report comprises 
a display window operable to display one or more of: 

one or more charts of one or more general or specific user ratings of the particular 
web page; 

user comments regarding the particular web page; and 

one or more survey displays of user answers to one or more explicit questions 
regarding the particular web page. 

21. (Previously Presented) The method of Claim 12, wherein the input indicating 
a desire to access the collected information concerning user interaction with the particular 
web page comprises entry of one or more particular keystrokes using a keyboard. 

22. (Previously Presented) The method of Claim 12, wherein: 
the particular web page comprises a first web page; and 

the website comprises one or more other web pages in addition to the first web page; 
the method comprising: 

receiving, from the website owner, a specification of one or more of the other 
web pages; and 

providing the website owner access from the first web page to collected 
information concerning user interaction with the specified other web pages in addition 
to the collected information concerning user interaction with the first web page. 
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23. (Previously Presented) Software for providing substantially real-time access 
to collected information concerning user interaction with a particular web page of a website 
while the particular web page is viewable, the software embodied in media and when 
executed operable to: 

receive, from a website owner who has accessed a particular web page of a website 
and using a web browser while the particular web page is viewable within a browser window 
of the web browser, input indicating a desire to access collected information concerning user 
interaction with the particular web page; 

determine whether the website owner is authorized to access the collected information 
concerning user interaction with the particular web page; and 

if the website owner is authorized to access the collected information concerning user 
interaction with the particular web page: 

generate a viewable user interface providing substantially real-time access to 

the collected information concerning user interaction with the particular web page; 

and 

to provide the website owner substantially real-time access to the collected 
information concerning user interaction with the particular web page, present the 
viewable user interface to the website owner in substantially real-time in response to 
the input received from the website owner while the particular web page was 
viewable within the browser window of the web browser. 
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24. (Previously Presented) The software of Claim 23, operable to: 
receive a password from the website owner; and 

to determine whether the website owner is authorized to access the collected 
information concerning user interaction with the particular web page, determine whether the 
password received from the website owner is valid. 

25. (Previously Presented) The software of Claim 23, operable to: 

receive, from the website owner, one or more specified filter criteria applicable to the 
collected information concerning user interaction with the particular web page; and 

filter the collected information concerning user interaction with the particular web 
page according to the specified filter criteria such that the website owner is presented only 
particular collected information concerning user interaction with the particular web page 
matching the specified filter criteria. 

26. (Previously Presented) The software of Claim 25, wherein at least one of the 
filter criteria comprises a time frame associated with the collected information concerning 
user interaction with the particular web page. 

27. (Previously Presented) The software of Claim 23, wherein the collected 
information concerning user interaction with the particular web page is user traffic 
information. 
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28. (Previously Presented) The software of Claim 23, wherein the collected 
information concerning user interaction with the particular web page is user feedback 
information concerning the particular web page. 



29. (Previously Presented) The software of Claim 28, operable, if the website 
owner is authorized to access the collected user feedback information, to: 

generate a report of the collected user feedback information; and 

present the report to the website owner to provide the website owner access to the 

collected user feedback information while the particular web page is viewable within the 

browser window of the web browser. 

30. (Original) The software of Claim 29, wherein the report comprises one or 
more of: 

a first display of a time frame associated with the collected user feedback information; 
a second display providing an overview of the collected user feedback information; 

and 

a third display of one or more sliding bars that each correspond to a particular type of 
collected user feedback information and indicate percentages of negative, neutral, and 
positive user feedback information of the corresponding particular type of collected user 
feedback information. 
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31. (Previously Presented) The software of Claim 29, wherein the report 
comprises a display window operable to display one or more of: 

one or more charts of one or more general or specific user ratings of the particular 
web page; 

user comments regarding the particular web page; and 

one or more survey displays of user answers to one or more explicit questions 
regarding the particular web page. 

32. (Previously Presented) The software of Claim 23, wherein the input indicating 
a desire to access the collected information concerning user interaction with the particular 
web page comprises entry of one or more particular keystrokes using a keyboard. 

33. (Previously Presented) The software of Claim 23, wherein: 
the particular web page comprises a first web page; and 

the website comprises one or more other web pages in addition to the first web page; 
the software operable to: 

receive, from the website owner, a specification of one or more of the other 
web pages; and 

provide the website owner access from the first web page to collected 
information concerning user interaction with the specified other web pages in addition 
to the collected information concerning user interaction with the first web page. 
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34. (Previously Presented) A system for providing substantially real-time access 
to collected information concerning user interaction with a particular web page of a website, 
the system comprising: 

means for collecting information concerning user interaction with a particular web 
page of a website; and 

means for receiving, from a website owner who has accessed the particular web page 
using a web browser and while the particular web page is viewable within a browser window 
of the web browser, input indicating a desire to access the collected information concerning 
user interaction with the particular web page; 

means for determining whether the website owner is authorized to access the collected 
information concerning user interaction with the particular web page; and 

means for, if the website owner is authorized to access the collected information 
concerning user interaction with the particular web page: 

generating a viewable user interface providing substantially real-time access to 

the collected information concerning user interaction with the particular web page; 

and 

to provide the website owner substantially real-time access to the collected 
information concerning user interaction with the particular web page, presenting the 
viewable user interface to the website owner in substantially real-time in response to 
the input received from the website owner while the particular web page is viewable 
within the browser window of the web browser. 
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ABSTRACT 



A system and method for monitoring and analyzing Internet 
traffic is provided that is efficient, completely automated, 
and fast enough to handle the busiest websites on the 
Internet, processing data many times faster than existing 
systems. The system and method of the present invention 
processes data by reading log files produced by web servers, 
or by interfacing with the web server in real time, processing 
the data as it occurs. The system and method of the present 
invention can be applied to one website or thousands of 
websites, whether they reside on one server or multiple 
servers. The multi-site and sub-reporting capabilities of the 
system and method of the present invention makes it appli- 
cable to servers containing thousands of websites and entire 
on-line communities. In one embodiment, the system and 
method of the present invention includes e-commerce analy- 
sis and reporting functionality, in which data from standard 
traffic logs is received and merged with data from 
e-commerce systems. The system and method of the present 
invention can produce reports showing detailed "return on 
investment" information, including identifying which ban- 
ner ads, referrals, domains, etc. are producing specific 
dollars. 
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SYSTEM AND METHOD FOR MONITORING The system and method of the present invention can also 

AND ANALYZING INTERNET TRAFFIC include real-time analysis and reporting functionality in 

which data from web servers is processed as it occurs. The 

This application claims the benefit of Provisional appli- system and method of the present invention can produce 

cation Ser. No. 60/157,649, filed Oct. 4, 1999. 5 animated reports showing current activity on the web server, 

which can be used by administrators and managers to 

BACKGROUND OF THE INVENTION monitor website effectiveness and performance. 

1. Field of the Invention The system and method of the present invention can 

. ■ , , „ . x . m . . rc- , _ further include e-commerce analysis and reporting function- 

Ine present invention relates to Internet tramc and, more nn 1V . , • ■ i r- * 1 «~ , • . 

specifically, to a system and method for monitoring and 10 abty m which data from standard traffic logs is received and 

analyzing Internet traffic. me ] r S ed ™? d / * e - c °mmerce systems The system 

. . and method 01 the present invention can produce reports 

2. Description of Related Art showing detailed "return on investment" information, 
Internet web servers such as those used by Internet including identifying which banner ads, referrals, domains, 

Service Providers (ISP), are typically configured to keep a 15 etc. are producing specific dollars. 

log of server usage by the on-line community. For example, The present invention can be achieved in whole or in part 
as a visitor to a website clicks on various hyperlinks and by a system for ana i yzing and monitoring internet traffic, 
travels through a website, each step is recorded by the web comprising a relational database, a log engine that processes 
server in a log. Each web page, image and multimedia file log files received from at least one mlernet server and stores 
viewed by the visitor, as well as each form submitted, may 20 data processed from t he log files in the relational database; 
be recorded in the log. and a re p 0rt en gi ne that generates reports based on the 
The type of information logged generally includes the processed data stored in the relational database. The system 
Internet Protocol (IP) address or host name of the visitor, the and method of the present invention preferably utilizes 
time of the transaction, the request, the referring page, the Visitor Centric Data Modeling, which keeps data associated 
web browser and type of platform used by the visitor, and 25 with the visitor that generated it, and that allows for the 
how much data was transferred. When properly analyzed, cross-comparing of different elements of data coming from 
this information can help marketing executives, webmasters, different log entries or different log files altogether, 
system administrators, business owners, or others make The aC companying drawings, which are incorporated in 
critical marketing, business, commerce and technical deci- and constitute a part of this specification, illustrates embodi- 
sions. The data can be mined for all types of decision 30 ments 0 f the invention and, together with the description, 
supporting information, e.g. analyzing which webbrowsers serve t0 explain the principles of the invention, 
people are using, determining which banner ads are produc- 
ing the most traffic, etc. ~ BRIEF DESCRIPTION OF THE DRAWINGS 

A problem with mining the raw log data for useful ^ FIG. 1 is a schematic diagram of a system for monitoring 

information is the shear volume of data that is logged each and analyzing Internet traffic, in accordance with the present 

day. ISPs may have dozens of web servers containing invention; 

thousands of websites that produce gigabytes of data each FIG. 2 is a schematic diagram of a series of hash tables 

day. Providing a robust system that can be used on various stored by the data b ase shown in FIG. 1; 

platforms, that can efficiently process the huge amounts of nG 3 fa a b]ock d{ Qf a ferred embodiment of 

data that are logged, and that can produce easy to use reports ^ { {m shown k FIQ v 

for each website m an automated fashion is a daunting task. „ T ° . . „ . , . 

MG. 4 is a flowchart and schematic diagram illustrating a 

BRIEF SUMMARY OF THE INVENTION preferred control routine for the log parser module of FIG. 

3; 

In view of the above problems in the art, the present AS - r? ir , s - a u*^u c r j 

j . j * ' f , 45 FIG. 5 is a flowchart and schematic diagram of a pre ferred 

invention provides a system and method for monitoring and . , .. c A c 

it* • * tc *u * • rc • * i*i control routine for the read line step of FIG. 4, for accessing 

analyzing Internet traffic that is efficient, completely , - 1 «i j + • V*- 

. J . , jo . , . t 4l _ . . \ , v 3 and processmg log file data in real time; 

automated, and fast enough to handle the busiest websites on r _ . » 

the Internet, processing data many times faster than existing F J G ' 6 1S a flowchart and schematic diagram illustrating a 

svstems ~ preferred control routine for the website identification mod- 

Tk ' * a *u a p.u 50 tile of FIG. 3; 
Ine system and method 01 the present invention processes 

data by reading log files produced by web servers, or by FIG ' 7 1S a flowchart and schematic diagram illustrating a 

interfacing with the web server in real time, processing the preferred control routine for the visitor identification module 

data as it occurs. The system and method of the present ' 

invention can be applied to one website or thousands of 55 FIG - 8 is a flowchart and schematic diagram illustrating a 

websites, whether they reside on one server or multiple preferred control routine for the buffer update module of 

servers. The multi-site and sub-reporting capabilities of the FIG. 3; 

system and method of the present invention makes it appli- FIG. 9 is a schematic representation of the contents of the 

cable to servers containing thousands of websites and entire database buffer shown in FIG. 3; 

on-line communities. 60 FIG. 10 is a schematic diagram illustrating the operation 

The system and method of the present invention can of the DNS resolver module of FIG. 3; 

create reports for individual websites, as well as reports for FIG. 11 is a flowchart and schematic diagram of a 

all of the websites residing on a single server or multiple feedback loop control routine preferably used by the DNS 

server. The system can also create reports from a centralized resolver module of FIG. 3; 

system, in which reports are delivered upon request directly 65 FIG. 12 is a schematic diagram of how a preferred 

from the system database via a Common Gateway Interface embodiment of an adaptable resolution mechanism in the 

(CGI). DNS resolver module operates; 
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FIG. 13 is a flowchart of preferred control routines for 500 for content, each website hit or transaction is appended 

various control loops within the DNS resolver module of to a log. Each web server will typically have its own log file. 

FIG. 3; Multiple websites on a single server could be logged cen- 

FIG. 14 is a flowchart and schematic diagram illustrating ^ k L one lo S fil f> °f co ^ d be con ^ red » tha * eac ; h 

a preferred control routine for the database update module of 5 website has its own log file. The system 100 is able to handle 

Pj G 3. all of these different architectures. 

J,'* - . , . .„ . . The entries on each of the log files 510 are interleaved so 

FIG. 15 is a schematic diagram illustrating the main that individual website hits or transactions are recorded in 

components of the database shown m FIG. 1; the order they are received If a sirjgle log file contains log 

FIG. 16 is a schematic diagram of a preferred embodiment 1Q entries from multiple websites, the log entries are also 

of the report engine of FIG. 1; interleaved so that individual hits or transactions from each 

FIG. 17 is a flowchart of a preferred control routine for the website are recorded in the order they are received. Each line 

session parser module of FIG 16* m me 1°S ^ es represents a hit or a transaction from the 

ctp to- a u * * f'j .1 .-f* u website on one of the web servers 500. 

FIG. 18 is a flowchart of a preferred control routine for the _ „_ 

authentication module of FIG. 16; 15 In addltlon to normal web traffic ' man y websites contam 

__ , „ „ , , . e-commerce enabled virtual "shopping carts" that allow 

FIG. 19 is a flowchart of a preferred control routme for the yisitors tQ securdy buy products difectly from ^ web$ite 

data query module of FIG. 16; The system m cm optionally analyze the demographics of 

FIG. 20 is a flowchart of a preferred control routine for the on-line shopping by receiving e-commerce log files 580 

format output module of FIG. 16; ^ produced by e-commerce enabled websites. The 

FIG. 21 is a schematic diagram of a preferred embodiment ~ e-commerce log files 580 are transaction logs that contain 

of a Javascript system used by the report engine of FIG. 16; information about each order placed on the website. Each of 

FIG. 22 is an example of a visitor monitor report created tne e-commerce log files 580 generally contains data on the 

by the system of the present invention; pricing of products purchased, dollar amounts and shipping 

FIG. 23 is an example of a temporal visitor drill down 25 ™f°f ' Sensitiv * information such as credit numbers, indi- 

report created by the system of the present invention; Vldual names and e-mail addresses are generally not stored 

^ A . . . . , on the e-commerce log files 580. Dashed lines are used to 

FIG. 24 a an example of a visitor footprint report created £nt , he e . commerce b mes 580 t0 indicate that the 

by the system of the present invention; e-commerce functionality is an optional feature of the sys- 

FIG. 25 illustrates an example of a system meter report t ern ^qq 

created by the system of the present invention; 30 xhe preferred embodiment of the log engine 200 is 

FIG. 26 shows visitor table containing e-commerce data, responsible for processing all of the log files 510 and 580, 

and residing in the database buffer; domain name system (DNS) resolving and updating the 

FIG. 27 shows an example of an ROIR e-commerce report database 300. The log engine 200 utilizes memory buffers, 

generated by the system of the present invention; 35 fixed-width data models and other techniques to efficiently 

FIG. 28 shows an example of a snapshot report generated process the log files 510 and 580. In addition, the log engine 

by the system of the present invention; * 200 can be optionally configured to access live data. The 

™^^ nu ir -*_r j operation of the lo? engine 200 will be described in more 

FIG. 29 shows an example or a user interface and an detail below 

hourly graph report generated by the system of the present ^ t 

invention- 40 8 engine 200 efficiently reads each line in each of 

_ ' , , the log files 510 and separates each line into its individual 

FIG. 30 shows an example of a top pages report generated Jhe individual ts can indude fields such ^ the Ip 

by the system of the present invention; address> dme stamp? biteg sentj ^ CQ ^ ^ ^ 

FIG. 31 shows an example of a directory tree report i og engine 200 utilizes a technique called Visitor Centric 

generated by the system of the present invention; 45 Data Modeling. Rather than parsing each log line and 

FIG. 32 shows an example of a search engines report counting how many of one type of browser was used or how 

generated by the system of the present invention; many times a particular webpage was viewed, Visitor Cen- 

FIG. 33 shows an example of a top domains report trie Data Modeling keeps that data associated with the 

generated by the system of the present invention; visitor that generated it. One of the primary advantages of 

FIG. 34 shows an example of a browser tree report 50 Visitor Centric Data Modeling is the ability to cross compare 

generated by the system of the present invention; dlff ^ nt elements of data coming from different log entries 

_ m t , or different log files altogether. Visitor Centric Data Mod- 

FIG. 35 shows an example of a top entrances report eUa allows Qne , 0 determine what percentage of ^ that 

generated by the system of the present mvention; and originated from a Yahoo™ search looked at a particular 

FIG. 36 shows an example of a top products report 55 webpage. 

generated by the system of the present invention. A 9K013 ^ benefit of Visitor Centric Data Modeling is 

DETAILED DESCRIPTION OF THE reduction of overall data processing. Because many ele- 

INVENT1 ON ments of the data will be the same during a visitor's visit, the 

information only needs to be processed once per visitor, 

FIG. 1 illustrates a system 100 for monitoring and ana- 6 o rather than once per log line. For example, the primary 

lyzing Internet traffic, in accordance with the present inven- domain name of the visitor will be the same for each log 

tion. The system 100 comprises a log engine 200, a database entry produced by a particular visitor. Visitor Centric Mod- 

300 and a report engine 400. eling allows one to process this information only once per 

In operation, log files 510 generated by web servers 500 visitor. Additional details on how the log engine 200 utilizes 

are sent to the log engine 200. Web (Internet) traffic is served 65 the Visitor Centric Data Modeling will be provided below, 

by the web server 500. The web server 500 can host one or The log engine 200 processes each log entry and updates 

many individual websites. As visitors access the web servers the database 300. The database 300 contains a series of hash 
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tables. The database 300 comprises a series of hash tables, 
as shown in FIG. 2. The hash tables comprise a visitor table 
310 and associated data tables 315. 

The visitor table 310 contains the central record for each 
visitor to a website. The hits, bytes, page views, and other 5 
fixed data parameters (hereinafter collectively referred to as 
"traffic information") are stored directly in the visitor table 
310. The remaining non-unique parameters, e.g., domain 
names, types of web browsers, referring web sites, etc., are 
stored relationally in respective data tables 315. For JQ 
example, one of the data tables 315 could be configured to 
store a list of the different domain names from which the 
visitors to the website being monitored by the system 100 
originate, while another of the data tables 315 could be 
configured to store the names of the different types of web 
browsers used by the visitors to the web site being monitored 15 
by the system 100. 

The database 300 is relational and centers the data in the 
visitor table 310, creating a Visitor Centric Data Model. The 
visitor table 310 contains a hash table 320 that is used for OQ 
quickly seeking visitor records. Below the hash table 310, 
the actual records 325 contain the traffic information of each 
visitor. Each unique visitor will have their own record in the 
visitor table 310. 

The visitor table 310 is relational in nature and has a 25 
relations area 330 that contains pointers 335 to records 350 
within the data tables 315. As discussed above, each of these 
data tables 315 store different visitor parameters such as 
domain, browser, and referral. Besides vastly reducing the 
storage requirements relative to a non-relational database, 30 
the data tables 315 can be used to create statistical reports on 
the usage of different visitor parameters. 

Each data table 315 contains a hash table 340, a rank table 
345, a record table 350, and a string table 355. The hash 
table 340 is used to seek records in the record table 350. The 35 
rank table 345 is used to keep track of the top entries in the 
record table 350 based on the number of pointers 335 set to 
the records in the record table 350. This is useful for quick 
access to reports. The record table 350 stores the actual 
records within the data table 315 including the traffic infer- 40 
mation associated with the parameter associated with the 
data table 315. The record table 350 does not store the value 
of the parameter. Instead, the record table 350 contains a 
pointer to a record in the string table 355. Each of these 
subtables (320, 325, 340, 345, 350, 355) has fixed width 45 
records allowing for efficient reading, writing, and copying 
of the entire data sets. 

The relational structure of the database 300 has at least 
two advantages. First, the visitor table 310 simplifies the 
task of processing each hit because, once the visitor is 50 
identified, the appropriate visitor table 310 can be identified 
and updated accordingly. Second, the data tables 315 sim- 
plify the task of report generation, because each of the data 
tables 315 stores a specific parameter (e.g., the names of the 
web browsers used by the visitors) and are ranked. Thus, 55 
each of the data tables 315 can easily deliver the top list of 
entries for a particular report. 

Referring back to FIG. 1, once the log files 510, and 
optionally the e-commerce log files 580, are processed by 
the log engine 200, and the database 300 is updated, the 60 
system 100 is ready to deliver reports based on the updated 
information in the database 300. A user 530 sends a report 
request 540 to the report engine 400 via a web server 520. 
The report engine 400 obtains the data required to generate 
the report from the database 300, generates the report, and 65 
delivers the generated report 550 to the user 530 via the web 
server 520. 
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The web server 520 can optionally be one of the web 
servers 500 that created the log files 510 and 580. The report 
engine 400 preferably utilizes javascript application 
techniques, dictionaries, and templates to provide flexible, 
efficient, customizable and attractive reports, as will be 
explained in more detail below. Reports are generated on the 
fly when requested by the user 530 using the standard 
Common Gateway Interface (CGI) of the web server 520. 
Storage requirements are kept small as all HTML and 
graphics for the reports are generated as needed. 

Log Engine (200) 

FIG. 3 is a block diagram of a preferred embodiment of 
the log engine 200. 

The log engine preferably comprises a log parser module 
210, a website identification module 220, a visitor identifi- 
cation module 230, a buffer update module 240, a DNS 
resolver module 250, a database buffer 260 and a database 
update module 270. 

The log parser module 210 is responsible for the actual 
reading and processing of the log files 510 and the 
e-commerce log files 580. The log parser module 210 can be 
configured to process either static log files or log files that 
are being generated live in real-time. The log parser module 
210 loads each log line from the log files 510 and 580 and 
separates each log line into its individual fields. 

The website identification module 220 is primarily used 
when multiple websites are being logged to the same file. A 
class of web hosting known as "virtual hosting" or "shared 
hosting" allows ISPs to offer solid performing website 
hosting service at reasonable prices. By setting up a robust 
set of servers with virtual hosting capable software, ISPs can 
place multiple websites on the servers, thus allowing the 
website owners to share the cost of the servers, maintenance, 
and networking. 

However, as ISPs squeeze more and more websites onto 
a server in order to generate profit in an ever increasingly 
competitive industry, creating a system that is scalable 
becomes more and more difficult. One problem that admin- 
istrators soon face is the number log files open during 
operation. Typically they will have at least one log file 510 
for each website. As they add hundreds or thousands of 
websites to a server, the handling of all log files 510 
becomes difficult. Moving, rotating and archiving all of the 
individual log files 510 becomes a burden. Also, system 
performance is compromised as resources are allocated to 
each open log file (many systems have a hard limit to the 
number of files that can be open simultaneously). 

To solve this problem, the system and method of the 
present invention utilizes Subreport/Multisite Reporting 
Technology. This technology allows hosting providers to 
centralize the logging for all websites. Each server can have 
just one log file 510 for all websites, keeping resources in 
check. There is just one log file 510 to manage, rotate, 
process and archive, thus making the administrator's duties 
easier, less expensive and more scalable. 

This website identification module 220 identifies each hit 
as belonging to a particular website. If the log file 510 or 
e-commerce log file 580 has data from only one website, 
then the task is simple and is handled through straight 
configuration. However, if the log file 510 or e-commerce 
log file 580 contains data from multiple websites, then the 
website identification module 220 employs a series of regu- 
lar expression filters to perform the website identification. 
The website identification module 220 must be flexible and 
be able to pull any consistent part of the log file 510 for 
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website identification. The website identification performed The log parser module 210 employs a log buffer 600 and 

by the website identification module is later used to deter- a pointer array 610 that is reused for each log line 512 in the 

mine what portion of the database 300 to write the data to. log file 510. Thus, memory allocation for this log parser 

As discussed above, the log engine 200 utilizes Visitor module 210 is only done at startup. The states of the log 

Centric Data Modeling. The first step in using a Visitor 5 buffer 600 and pointer array 610 at each step in the control 

Centric Data Model is to be able to identify the specific routine shown in FIG. 4 are represented schematically under 

visitor within each log file line. The visitor identification the corresponding step in the control routine, 

module 230 analyzes the fields in each hit (log file line) and The control routine starts at step 620, where the pre- 

identifies the hit as belonging to a new or existing visitor. allocated log buffer 600 and the pointer array 610 are 

Based on a unique identifier, such as an IP number or session 1Q cleared. The log buffer 600 is cleared by setting the first 

id and a timestamp, the visitor identification module 230 character in the log buffer 600 to zero. The pointer array 610 

determines which visitor record in the database 300 will is cleared by setting the values of all the individual pointers 

need to be updated. If the timestamp of the hit is within a 612 to zero. It is important for stable processing to set all of 

predetermined amount of time (e.g., 30 minutes) of an the pointers in the pointer array 610 to zero before using the 

existing visitor, then the hit is considered as coming from pointer array 610. 

that visitor. 15 T° e control routine then continues to step 630, where the 

The buffer update module 240 updates the parameters of *jg ^ Hne ? 12 ™ the lo S fi ^ e 1 51 ^ re * d into the j°S buffer 

the visitor record found by the visitor identification module 600 For f P^er module 210 that is configured to 

230 and stored on the database buffer 250 with the current ^T^ffi g ?l ' ° 15 accom P llshed usin * 

• * *• rpi . • , VtU ... . j4 , standard file access library calls, 

hit s ^formation The timestamp of the hit is used to keep M Jhe control ^ ^ tQ ^ ^ 

the chronological order of events intact. spacers are identified {n ^ ^ ^ m an / marked ^ 

The database buffer 250 is a volatile storage area, pref- Md spacers could be spaces> { ^ commas? Qr anything ^ 

erably RAM memory, that mirrors the actual database 300. can be used ^ the separator between the fields in the logging 

At the beginning of processing, current data is read from the format 

database 300 into the database buffer 250. After processing 25 At step 65Q the marked fidd rs are kced with a 

is complete, data is written back to the database 300. The zero and the appropriate pointer 612 is set to the next 

purpose of the database buffer 250 is to speed up the character in the log buffer 600. Although steps 640 and 650 

processing of each hit. Instead of accessing the actual are shown as separate steps for purposes of illustration, they 

database 300 for each hit m the log file ; 510 or e-commerce are preferably performed at substantially the same time, 

log file 580, the database buffer 250 allows the log engine 30 ^ with a single loop and without moving5 CQpying Qr 

200 to build up the data in the faster RAM memory location allocating any memory, the log buffer 600 containing the 

of the database buffer 250 and then flush data to the database single log Hne 512 is converted mt0 a series of smaller 

300 in larger chunks The operation of the database buffer character strings, each representing a particular field 602, 

250 will be explained in more detail below. and with each zerQ terminatecL 

Before outputting the data to the database 300, the data is 35 The pointers 612 in the pointer array 610 can then be used 

passed through the DNS resolver module 260 for reverse to access tne fields 602 as if they were separate strings. 

DNS resolution of IP addresses. Most web servers log only Accordingly, with minimal processing and absolutely no 

the IP address of the visitor and not the host and domain iterative memory allocation, each log line 512 is read and 

information. The domain information provides valuable data efficiently separated into its fields 602. 

about the physical and network location of visitors. The 40 Real-Time Control Routine for Log Parser Module (210) 

DNS resolver module 260 employs a customized resolution FI G. 5 is a flowchart and schematic diagram of a preferred 

routine designed specifically to speed up the process of control routine for the read line step of FIG. 4, for accessing 

typically slow DNS operations. and processing log file data in real time. A web server 500 

The database update module 270 performs the task of under normal configuration is shown. The web server 500 

updating the database with the contents of the database 45 handles all requests as they come in and logs each hit to the 

buffer 260. The database update module 270 performs some log file 510 by appending the log file 510 with data from 

processing (e.g., visitor sorting) before writing to the data- each request. 

base 300. The built in log file 510 acts as a buffer. It is the simplest 

Preferred control routines for the log parser module 210, and most robust way to pass data between the web server 

website identification module 220, visitor identification 50 500 and the live data access routine 700. The five data access 

module 230, buffer update module 240, DNS resolver mod- routine 700 can be turned on or off at will. Once started, the 

ule 260 and database update module 270 will be described live data access routine 700 runs as a low priority daemon, 

below. The live data access routine 700 can exist in two states: wait 

Lop, Parser Module (210) 710 and process 720, toggling between the two as data 

FIG. 4 is a flowchart and schematic diagram illustrating a 55 arrives into the buffer 510. 

preferred control routine for the log parser module 210 of As long as more data exists in the log file 510, the system 

FIG. 3, configured to process static log files 510. One of the will stay in the process loop 720. The control routine starts 

most time consuming operations is reading and processing at step 730, where the system checks for an "End of File" 

the raw log files 510. With individual log files 510 contain- mark in the log file 510. As long as this mark is not detected, 

ing potentially over a gigabyte of data, getting the raw data 60 control moves to read step 740, where the next line in the log 

into the system 100 is an important step. file 510 is read into the system. Control then continues to the 

The purpose of the log parser module 210 is to efficiently finish control routine step 750, which finishes the control 

read each log line 512 and separate it into its individual routine steps in the log parser control routine of FIG. 4, 

fields. The fields can include the IP address, timestamp, starting with the mark fields step 640 in FIG. 4. All of the 

bytes sent, status code, referral, etc. As discussed above, 65 read, write and EOF routines are autonomous, which means 

each log line 512 in the log file 510 represents a hit or the web server 500 can continue to write new data to the end 

transaction from one of the web servers 500. of the log 510 during the live data access routine 700. 
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Once the live data access routine 700 catches up and 
finishes the log file 510 by reaching the "End of File" 
marker, control moves to truncate step 760, where the log 
file 510 is immediately truncated. The truncation call sets the 
size of the log file 510 to zero. Since appended files always 
check file sizes before writing, the next write from the web 
server 500 will automatically start at the beginning of the log 
file 510. Control them moves to delay step 770, which delays 
the control routine for a configurable amount of time 
(typically <=1 second). After this delay interval, control 
returns to the EOF step 730, where the existence of new data 
is checked. 

As long the log file 510 is empty, the live data access 
routine 700 will remain in the wait loop 710. In this manner, 
the live data access routine 700 has real-time access to write 
data, while maintaining an arms length from the web server 
500 itself. 

Website Identification Module (220) 

FIG. 6 is a flowchart and schematic diagram illustrating a 
preferred control routine for the website identification mod- 
ule 220 of FIG. 3, which is designed to identify the website 
that created each log line 512 in a log file 510. The log lines 
512 are interleaved and written to the log file 510 as hits 
occur. The format of the log file 510 may vary from provider 
to provider. Some may use the canonical domain name in the 
log file 510, while others will use a subdirectory in the URI 
to identify the website. 

There are three configuration variables that pertain to the 
control routine shown in FIG. 6. The subreport field (SF) 
specifies which field in the log file 510 contains the website 
identifier text. The subreport expression (SE) is a POSIX 
extended regular expression that is used to capture all or part 
of the field specified by SE The report name expression 
(RN) is used to build the website name from the information 
captured by SE. 

As discussed above, the log parser module 210 processes 
each log line 512 one at a time, and separates the log line 512 
into separate fields 602. In the log file 510 shown in FIG. 6, 
log line field 602' contains the website identifier text, and is 
also indicated in FIG. 6 with shading. 

The control routine for the website identification module 
begins at step 800, where log line field 602' is selected using 
the SF configuration variable. The control routine then 
continues to step 810, where the subreport expression (SE) 
is applied to the log line field 602' selected at step 800. This 
is done using POSIX extended regular expressions. The 
operator of the system 100 will need to be familiar with 
regular expressions or seek assistance from the manuals or 
technical support. The SE expression is used to match part 
or all of log line field 602'. Parenthesis are used to define 
what is to be matched. For example, to simply capture the 
entire field, the SE expression "(.*)" would be used. 
Whereas, to capture the last parts of a "www" domain name, 
the expression "www\. (.*)" could be used. Whatever is 
matched inside the parenthesis is placed into a first variable 
$1. If there are multiple sets of parenthesis, then subsequent 
matched components are placed into additional variables 
(e.g., $2, etc.). In the example shown in FIG. 6, two 
variables, $1 and $2, are used. 

Next, at step 820, the $1 and $2 variables are used to 
generate the name 830 of the website. Using the report name 
expression (RN), the variables $1 and $2 are replaced with 
the actual contents of the matched components. For 
example, if the following configuration parameters are set: 
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SF=2 

SE=SITE:(.*) 

RN= www.mydomain.com/Sl 
and the following space -separated log line was processed: 

123.12.3.1 2000-08-02 SITE:human-resources/ 
index.html 200 1234 the website identification module 220, 
at step 800, would select "SITE:human-resources" as log 
fine field 602' in the log line 512. The SE would capture 
everything after the "SITE:" part of log line field 602' as 
defined by the parenthesis location in the SE expression. 
This information is placed into the $1 variable. The website 
name 830 is then identified at step 820 by expanding the RN 
expression and replacing the $1 variable with the actual 
contents of the match. In this example, the resulting website 
name 830 is "www.mydomain.com/human-resources". 
Visitor Identification Module (230) 

FIG. 7 is a flowchart and schematic diagram illustrating a 
preferred control routine for the visitor identification module 
230 of FIG. 3. The log file 510 contains a number of log lines 
512 or hits. Because the log lines 512 are interleaved, each 
log line 512 can be from a different visitor. As discussed 
above, the log parser module 210 processes each log line 
512 in the log file 510, and places the information in the log 
buffer. The log line fields 602 are separated and the data is 
passed to the visitor identification module 230. 

In the log file 510 shown in FIG. 7, log line field 602' r 
contains the ID value and log line field 602'" contains the 
timestamp of the hit. Log fine fields 602" and 602"' are also 
indicated in FIG. 7 with shading. 

The control routine for the visitor identification module 
230 begins at step 900, where log line fields 602" and 602'" 
are selected, as represented schematically under the Identify 
step 900 in FIG. 7. The control routine then continues to step 
910, where the control routine looks up the ID value 602" in 
the visitor hash table 320 of the visitor table 310 (shown in 
FIG. 2). If the ID value 602" does not exist in the visitor hash 
table 320, control continues to step 920, where a new visitor 
record is created in the visitor hash table 320. If the ID value 
602" does exist in the visitor hash table, control skips to step 
930. 

At step 930, the timestamp 602'" of the log line 512 is 
checked against the time range of the visitor record in the 
visitor hash table that corresponds to the ID value 602". If 
the timestamp 602"' falls within a predetermined allowable 
range, control continues to step 940, where the visitor record 
identified by the ID value 602" in the visitor hash table is 
determined to be the existing visitor. Otherwise, control 
jumps back to step 910, where the seek continues through 
records not previously searched until either a new record is 
created or another existing visitor is found. 

The Visitor Centric Data Modeling described above has a 
very important and powerful benefit for real world applica- 
tions. Many systems or websites will use multiple servers 
either mirroring each other or each handling a different part 
of a website. Extremely busy websites will often use an 
array of servers to handle the extreme load of traffic. Other 
websites may have a secure server area that resides on a 
special machine. 

Whether for robustness or functionality, multiple server 
architecture is a common practice and appears to create a 
unique problem for internet traffic analysis and reporting. 
Each web server 500 will create its own log file 510, 
recording entries from visitors as they travel through the 
website. Often, a single visitor will create log entries in the 
log file 510 for each web server 500, especially if the web 
servers 500 perform different functions of the website. 

It is desirable to be able to merge and correlate more than 
one log file 510 so as to have a complete and single record 
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of a particular visitor. The Visitor Centric Data Modeling 300 for each hit. Data is correlated and cached into the 

described above makes this ability automatic. Since each hit database buffer 250, which stores the data temporarily while 

is uniquely identified to a particular visitor and the times- processing the log file 510. When processing of the log file 

tamp of the hit is recorded, determining the order and 510 is completed, the database buffer 250 is written back to 

location of the hits do not require any additional engineer- 5 the database 300 in one step. 

ing. The system and method of the present invention will The use of a database buffer 250 results in more RAM 

automatically correlate the multiple log files as if they were usage, but has the advantage of lowering the overhead of 

coming from a single log file. database access, resulting in faster processing times. By 

Buffer Update Module (240) pre-inspecting the log files 510, the log engine 200 deter- 

FIG. 8 is a flowchart and schematic diagram illustrating a 10 mines the time ranges being used and reads the appropriate 

preferred control routine for the buffer update module 240 of data into the database buffer 250. The database buffer 250 

FIG. 3. The control routine starts at step 1000, where it is allows Urchin to avoid reading and writing to the database 

determined if the log line 512 (hit) is from a new day by 300 for each log line 512. Instead, the log engine 200 is able 

analyzing the timestamp 602'" of the log line 512. If the log to make updates to the visitor tables 310 and the data tables 

line 512 is the first of a particular day, then control continues 15 315 in memory (through the database buffer 250) and then 

to step 1010. Otherwise, control jumps directly to step 1020. read and write the entire data block to and from the database 

At step 1010, the database buffer 260 is preloaded with 300, which is preferably stored on disk, only once, 

any existing contents for that day from the actual database Database Buffer (250) 

300. Control then continues to step 1020. FIG. 9 is a schematic representation of the contents of the 

At step 1020, the visitor record identified or created by the 20 database buffer 250. As discussed above, the database buffer 

visitor identification module 230 is located in the database 250 mirrors a portion of the database 300, preferably in 

buffer 260. The located visitor record 1040 is shown sche- RAM. Thus the visitor tables 310' and data tables 315' in the 

matically under the locate visitor record step shown in FIG. database buffer 250 have the same format as the visitor 

8. tables 310 and data tables 315 in the actual database 300. 

Control then continues to step 1030, where the located 25 Because the database buffer 250 is loaded with data from 

visitor record 1040 is updated and new information for that the database 300, the visitor tables 310' and data tables 340' 

visitor is inserted into the located visitor record 1040. Traffic in the database buffer 250 are also relational. The data is 

information is preferably updated for the visitor If the centered in the visitor table 310', creating a Visitor Centric 

located visitor record 1040 is a new visitor record, then Data Model. The visitor table 310' contains a partially filled 

domain, referral, and browser information is preferably 30 hash table 320' that is used for quickly seeking visitor 

inserted into the located visitor record 1040. All visitors records. Below the partially filled hash table 310', the actual 

preferably have their path information updated with any new records 325' contain data about each visitor, such as hits, 

page view information. The updated visitor record 1050 is bytes, time, etc. Each unique visitor will have their own 

shown schematically below the update record step 1030. record in the visitor table 310'. As each log line 512 is 

The timestamp 602'" of the log line 512 is used to 35 processed and identified to a particular visitor, that visitor's 

determine the order of the events that took place. An record is updated in the visitor table 310' within the database 

illustrative example is shown in FIG. 8. In the example buffer 250. 

shown, a particular visitor is recorded as looking at Page A Like the visitor table 310 in the actual database 300, the 

1060 first and then Page C 1070. If the next log line 512 visitor table 310' in the database buffer 250 is relational in 

processed from the log file 510 indicates that the visitor 40 nature and has a relations area 330' that contains pointers 

looked at Page B 1080, the buffer update module 240 (at step 335' to the data tables 315'. Like the data tables 315 in the 

1030) checks the timestamp 602'" of the log line 512 to see actual database 300, each of the data tables 315' in the 

where in the chain of events the page belongs. In the database buffer 250 store different visitor parameters such as 

example shown, Page B 1080 occurred between Page A domain, browser, and referral. 

1060 and Page C 1070. Thus, Page B 1080 is inserted into 45 Each data table 315' contains a hash table 340', a rank 

the visitor record between the Page A 1060 and Page C 1070. table 345', a record table 350', and a string table 355'. The 

In this manner, the system 100 is able to update and correlate hash table 340' is used to seek records in the record table 

visitor data even if it is out of order in the log file 510. 350'. The rank table 345' is used to keep track of the top 

This automatic processing of multiple log files 510 came entries in the record table 350' based on the number of 

from the discovery that a single multi-threading web server, 50 visitors using the parameter associated the data table 315'. 

such as Netscape, may not log all hits sequentially in time. This is useful for quick access to reports. The record table 

Due to the nature of multi-threading applications, it is 350' stores the actual records within the data table 315' 

possible that a single log file 510 may contain hits out of including the traffic information associated with the param- 

chronological order. The system and method of the present eter associated with the data table 315'. The record table 350' 

invention was therefore designed to handle this situation 55 does not store the value of the parameter. Instead, the record 

properly by checking the timestamp 602'" of each log line table 350' contains a pointer to a record in the string table 

512 and inserting the information in the log line 512 into the 355'. Each of these subtables (320, 325, 330, 340, 345, 350, 

appropriate place in the retrieved visitor record 1040 based 355) has fixed width records allowing for efficient reading, 

on the chain of events. With this functionality, the processing writing, and copying of the entire data sets. In addition to the 

of multiple load-balancing log files 510 is as simple as 60 fixed width nature of the subtables, the records in the 

reading two log files instead of one. subtables are allocated in large blocks. Memory allocation is 

The operation of the database buffer 260 will now be not necessary for each new record individually, 

explained in more detail. As discussed above, the log engine Besides using efficient hashing algorithms for processing 

200 contains an internal database buffer 250 that mirrors part the data, resizing of the database buffer 250 is done so that 

of the actual database 300, preferably in RAM. This allows 65 data tables 315' and the hash table 320' in the visitor table 

the log engine 200 to correlate and update visitor records 310' are partially empty. This allows new records to be 

quickly for each Is hit without accessing the actual database created instantly without allocating additional memory. The 
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gray areas in the data tables 315' and the hash table 320' in operation without the additional overhead. Besides improv- 

the visitor table 310' indicate the used portions. As the tables ing the overall speed and accuracy, the porting of the 

reach a predetermined fullness threshold, they are preferably software is simplified, as it depends on less library calls, 

increased in size. The DNS resolver module 260 generally uses the User 

Once the processing of the log file 510 is complete, the 5 Datagram Protocol (UDP) on top of the IP network protocol, 

data tables 315' and the visitor table 310' are written back ^ UDP protocol has inherent parallel capabilities. Each 

into the actual stored database 300. The subtables are written 1 uer y in the protocol is sent like a letter and uses a 

separately so that empty records are not stored on the disk connectionless socket. Thus, multiple queries can be sent 

that holds the actual database 300. However, the fixed width simultaneously without waiting for responses. Multiple 

nature of the subtables allows for efficient writing of entire 10 can be received at any time and in any order, 

blocks of data to the actual database 300. The use of the ™ ere %™ ^ uaraDtee that * U t] f answers will return or that 

database buffer 250 increases the speed of the log engine 200 ^J^l TTh ^ *n °f r ' * Ut ' " th ? 
, c ± 11 . I,.? ^ quenes are tracked with an ID number, this UDP protocol 
by avoiding frequent memory allocation and disk .access. By can be used effectively to para ii e H Z e the DNS resolving 
caching information in volatile memory (in the form of the operation without the overhead of threads, 
database buffer 250), and reading and writing fixed sized 15 FIG . 10 is a schematic diagram of illustrating the opera- 
blocks of data, the log engine 200 is extremely fast. tion of tne D NS resolver module 260. The DNS resolver 
DNS Resolver Module (260) module 260 communicates with a local name server 1100. 

When a web server 500 receives a request for a web page, The local name server 1100 is part of the Internet 1110 DNS 

the web server 500 can either log the IP address of the visitor system, but resides in the local network as a primary 

or it can use DNS to resolve the host and domain informa- 20 cacheing name server acting as a relay between the DNS 

tion of the visitor. While domain information is valuable for resolver module 260 and the multiple DNS servers in the 

market analysis purposes, the resolution can add significant Internet 1110. 

overhead to the web server 500 and delay the response of the The communication between the DNS resolver module 

web server to the end user. It is therefore desirable to pass 260 and the local name server uses several UDP sockets 

the task of DNS resolving onto the system 100 of the present 25 1120. The UDP sockets 1120 are setup and destroyed only 

invention. This allows the web server 500 to stay as light and once. Once the UDP sockets 1120 are established, the DNS 

quick as possible for visitors accessing the website. resolver module 260 sends groups of queries 1130. The 

One of the biggest and most time consuming tasks in queries 1130 are represented by "Q" boxes, and the 

processing web server logs files 510 and creating valuable responses (or answers) 1140 are represented by "A" boxes, 

reports is the processing of the reverse DNS of the IP 30 The local name server 1100 relays the queries 1130 and 

numbers. Each IP number must be converted to a host/ answers 1140 to the Internet 1110 using a built-in DNS 

domain name by using the distributed DNS system of the system. The local name server has cacheing ability and will 

Internet. While the local name server may cache many of the remember recently asked queries 1130 and answer immedi- 

answers, most will likely need to go out to the Internet for ately instead of sending them on to the Internet 1110. 

resolution. 35 One of the keys to shortening the processing time is to get 

The speed and scalability of the present system 100 is one as many queries 1130 out in the Internet 1110 at one time, 

of its advantages within the operations of large hosting This shortens the waiting significantly. Without the use of 

companies. Whether processing single large websites or threads, the DNS resolver module 260 takes advantage of 

hundreds of thousands of small websites, the speed of the the UDP protocol, and goes through a loop of sending and 

DNS resolver module 260 is important. The DNS resolver 40 reading queries 1130 and answers 1140, as will be described 

module 260 uses several innovative techniques for improv- in more detail below. Without waiting for all answers 1140 

ing the speed and accuracy of the process, as will be to return or for thread controls to be freed up, the DNS 

described in more detail below. resolver module preferably sends as many queries 1130 as 

For each IP number that needs resolving, a query is sent possible out into the Internet 1110. 

out to the Internet, where it bounces around a few times in 45 As incoming answers 1140 are decoded and the ID 

the DNS system before coming back with the answer. This numbers are matched with the originating queries 1130, the 

can take up to a couple of seconds, and sometimes the IP numbers are efficiently resolved in a manner that truly 

answer never comes back. As far as the local system is parallelizes the waiting and thus dramatically reduces the 

concerned, the bulk of this time is spent waiting for the processing time without the overhead of threads, 

response. An aspect of the present invention is the discovery 50 During the flood of queries 1130 and answers 1140, the 

that, since each of the queries is separate and unique, the DNS resolver module 260 goes through a primary loop of 

processing can be done in parallel using multithreading sending queries 1130 and reading answers 1140. The kernel 

techniques. The overall waiting can be done all at once level sockets and the local name server 1100 can only handle 

instead of sequentially, thus shortening the overall process- so many requests simultaneously, and will drop excess 

ing considerably. 55 queries 1130 if capacity is reached. While having a few (i.e., 

For example, if ten queries are each resolved in one less than 10%) of the queries 1130 dropped is acceptable, 

second each, normal overall processing time would be ten having too many queries 1130 dropped will result in a large 

seconds. However, by making the operation parallel so that percentage of retries, creating additional work and actually 

all ten queries are processed simultaneously, then the overall slowing the overall processing time. However, it is desirable 

processing time could be reduced to one second. 60 to send queries 1130 as rapidly as possible. What is needed 

In practice, however, multithreading systems, such as is a feedback loop that can adjust the rate at which queries 

those based on the use of POSIX threads and BIND 8.2, 1130 are sent and the waiting time for answers 1140. 

carry a significant overhead, and the setting up of sockets FIG. 11 is a flowchart and schematic diagram of a 

and memory locking reduces the benefits of the multithread- feedback loop control routine preferably used by the DNS 

ing. Instead, the DNS resolver module 260 is not based on 65 resolver module 260. A resolver loop 1150 controls a loop 

threads, but takes on the advantage of the parallel nature of that cycles between sending and reading queries 1130 and 

the underlying protocols themselves to simulate threading answers 1140. 
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The control routine starts at step 1160, where a group of 
queries 1130 are sent through the UDP sockets 1120. Once 
the queries 1130 are sent, control continues to step 1170, 
where the resolver loop 1150 will try reading answers 1140 
for a predetermined amount of time (Timeout). Once the 
Timeout is reached, the resolver loop will compare how 
many queries 1130 were sent against how many answers 
1140 were received, and adjust the Timeout accordingly. 
Control then returns to step 1160. 

In addition to the socket speed capabilities, certain queries 
1130 will inherently take longer than others. Some queries 
1130 may need to go halfway around the world before 
resolving is completed. To minimize this effect, The resolver 
loop 1150 preferably begins with a very aggressive (short) 
Timeout, and progressively increases the Timeout to wait for 
the answers 1140 that are taking longer to arrive. The 
resolver loop 1150 will actually go through multiple loops 
and, at a slower pace, reattempt queries 1130 that were never 
answered. This adaptable resolving speed control gives the 
DNS resolver module 260 the ability to process the bulk of 
queries 1130 very quickly, and minimize the impact of a few 
slow or non-responding answers 1140. 

The DNS resolver module 260 is preferably configured 
with the ability to increase the resolving percentage and 
overall accuracy of the DNS resolving module 260 by 
adapting the query level. Under normal DNS resolving, the 
IP number is mapped to a specific hostname. For example, 
the IP number 202.110.52.16 may map to the hostname: 

diall41-sddc2.npop43.aol.com 
While it may be interesting to see the "diall41- 
sddc2.npop43" part of the hostname, one is typically only 
interested in the domain part (e.g., "aol.com"™) of the 
answer 1140. The first part of the answer 1140 is specific to 
each provider and does not contribute to the demographic- 
type reporting that the present system 100 is preferably 
designed to provide. 

In many networks, especially government, military, and 
small private networks, individuals IPs are not always 
mapped to anything. The query 1130 of a specific IP may 
return with an answer 1140 of "unknown host", which 
means that not all if the IPs were mapped back to the 
hostnames. Unfortunately this can reduce the resolving 
percentage by 20 or 30 percent, and skew the demographic 
data away from non -resolvable networks such as are often 
found in government, military, and educational networks. 

To make up for this deficiency, the DNS resolving module 
260 preferably deploys an adaptable resolving level mecha- 
nism that attempts to find out who controls the network in 
question if the hostname answer 1140 returns unsuccess- 
fully. 

FIG. 12 is a schematic diagram of how a preferred 
embodiment of the adaptable resolution mechanism oper- 
ates. An unresolved IP number 1180 enters the DNS resolver 
module 260. The DNS resolver module 260 will make 
multiple attempts at resolving the IP number by sending out 
multiple queries 1130 one at a time using different query 
information. The first query 1130a will attempt to resolve 
the entire specific IP number. If that returns unsuccessful, 
then a second query 11306 will attempt to resolve the 
Class-C network address (a Class-C network address is 
equivalent to the first three parts of an IP address). 

If the second query returns unsuccessful, a third query 
1130c will attempt to resolve the Class-B network address. 
If the third query is unsuccessful, a fourth query 1130d will 
attempt to resolve the Class- A network address. Many times, 
the Class-C or Class-B network addresses will resolve 
correctly when the IP address did not. 
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This technique improves the resolving accuracy dramati- 
cally and improves overall performance speed. In the case of 
government, military, educational and other private 
networks, "unresolved" percentages have been observed to 
5 go from 35% down to 8%, and "kl2.us" and "navy.mil" 
show up in the top domains reports using the adaptable 
resolving level mechanism of the present invention. While 
these domains are not resolving their individual IPs, the 
general source of the traffic is obtained. 
10 Using the above-described techniques, the DNS resolver 
module comprises a nested-loop, adaptable system that is 
fast and efficient. The nested-loop architecture is shown in 
FIG. 13, which is a flowchart of a preferred control routine 
for the various loops within the DNS resolver module 260. 
15 The control routine begins by initializing some variables, 
including five configuration variables 1190 that include: 

resolution target (RT); 

number of loops (NL); 
2Q queries per write (NQ); 

interquery delay (DQ); and 

wait timeout (WT). 

These five settings represent starting points for operation. 
They may be modified at runtime using the feedback mecha- 

25 nism discussed above in connection with FIG. 11. The 
control routine comprises a main loop 1200, a visitor loop 
1210 nested within the main loop 1200, and a read loop 1215 
nested within the visitor loop 1210. Dashed lines indicate 
asynchronous non-loop flow tasks. Sockets are initialized 

30 before the main loop 1200 begins. 

The control routine begins at step 1220, where it is 
determined if the loop should continue. The loop 1200 will 
continue as long as the "number of loops" (NL) has not been 
reached and the "resolution target" (RT) has not been 

35 reached. NL is incremented once the loop begins and RT is 
adjusted after each "decode answer" step 1290, which will 
be described below. 

The NL and RT variables serve an important purpose. 
They allow a high resolving target to be set, while setting an 

40 ultimate timeout. Depending on the size of the data, the 
number of sites, and the amount of time available, system 
administrators can modify these variables before operation. 
Once the resolution target, or the number of loops NL, is 
reached, the control routine will exit and clean up. 

45 If NL and RT have not been reached, control continues to 
the visitor loop 1210, whose purpose is to build and send 
queries for each unresolved visitor in the visitor table 310'. 
The visitor loop 1210 starts at step 1230, where the next 
unresolved visitor record from the visitor table 310' is pulled 

50 and a query 1130 is built. An ID number 1250 from the 
visitor table 310' is used in the building of the query 1130 so 
that it can be tracked later on as a response. 

Next, at step 1240, the query 1130 is sent to the UDP 
sockets 1120. The UDP sockets 1120 are used in round robin 

55 fashion which allows minimizes the waiting for buffer 
controls. 

A counter keeps track of how many queries 1130 have 
been sent in the current batch. Control then continues to step 
1260, where the counter is checked against the NQ variable. 

60 If NQ has not been reached, control loops back to step 1230. 
An optional interquery delay (DQ) step 1270 can be inserted 
between steps 1260 and 1230 to keep the visitor loop 1210 
from running too fast. 

If NQ has been reached, which occurs when all the 

65 queries in the batch have been sent, NQ is reset and control 
then continues to the read loop 1215. The read loop 1215 
continues until the WT timeout variable is reached. 
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At step 1280, any buffered incoming answers 1140 are is updated with the data in one of the visitor groups 1400. 

read from the UDP sockets 1120. Next, at step 1290, each The process then continues to step 1390, where the database 

answer 1140 is decoded. Control then continues to step 300 is closed. 

1300. The control routine then loops back to step 1370, and the 

At step 1300, it is determined if the answer 1140 is 5 database update process is repeated for each visitor group 

successful. If the answer 1140 is successful, control contin- 1400 By proce ssing the records in groups, the overhead 

ues to step 1310, where the visitor table 310' is updated with created by access i ng the database 300 is reduced, 
the domain information. Control then continues to step 

1330. Database (300) 

If, at step 1300, it is determined that the answer 1140 is 

unsuccessful, control continues to step 1320, where the FIG. 15 is a schematic diagram illustrating the main 

record in the visitor table 310' is modified by changing the components of the database 300. As discussed above, the 

resolution status. The resolution status is used to control the database 300 contains a visitor table 310 and data tables 315. 

resolution level, as discussed above. If the answer 1140 The structure is relational in nature as the visitor table 310 

comes back as "unknown" then the resolution status is relates to information stored in the data tables 315. 

changed for that visitor record, indicating that the next query 15 ^ databage m ^ mGfbods module ±m ^ 



1130 should attempt to resolve the larger network instead of 



provides an interface for accessing, seeking, and inserting 



the specific m Control then continues to step 1330 data into the visitor and data tables310 and 315. Both the log 

At step 1330 the read loop 1215 condition * checked by ^ 2Q0 and & ^ m ^ ft meth J s 

determining if the incoming UDP sockets 1120 are empty h 1 1410 

and if the timeout WT has been reached. If the incoming 20 m0 U e 

UDP sockets 1120 are empty and the WT timeout has been ^ methods module 1410 is the only module that is 

reached, the read loop 1215 ends, and control flows back to allowed to directly access the data in the database 300. This 

the visitor loop 1210 at step 1340. Otherwise, the read loop creates a modularity to the database 300, in which the format 

1215 continues, and control loops back to step 1280. of the visitor table 310 and/or the data tables 315 can be 

At step 1340, it is determined if the resolution target (RT) 25 modified without changing the interface to the other mod- 
has been reached. If it has, the visitor loop 1210 ends, and ules in the system 100. 
control flows back to the main loop 1200 at step 1350. p p 
Otherwise, the visitor loop 1210 continues at step 1230 with Keport bngine 
the next batch of unresolved queries. As ISPs add thousands of web sites to a single system, the 

At step 1350 of the main loop 1200, the WT timeout is 30 creation of reports can begin to take as long as processing 

adjusted (increased for the next loop). Control then contin- the data. With an ever increasing number of reports to create, 

ues to step 1220, where NL and RT are checked, NL is the disk space and time needed to accomplish this side of the 

incremented and starts the entire process over again if task can become a problem. The report engine 400 provides 

neither NL nor RT have been reached. a centralized system that contains a single copy of the report 

With minimal overhead, the DNS resolver module 260 35 templates and icons needed to generate reports, and delivers 

takes advantage of the UDP protocol and maximizes the specific reports for a particular web site only when 

parallelization of the processing. Through a series of nested requested. 

loops and control parameters, the DNS resolver loop is able The report engine 400 only stores the data for each web 

to adapt both speed and level in order to meet the resolving s j te , and not the specific reports. Since the reports are 

target as quickly as possible. Multiple rounds and levels of 40 we b-based, they can be delivered on the fly as requested 

queries 1130 are resent to cover lost or failed attempts, through the Common Gateway Interface (CGI) of the web 

thereby increasing overall accuracy and resolution percent- server. 

age dramatically. Thus, system administrators can put a cap nG 16 ^ a schematic di of a ferred embodiment 

on overall processing time, while maintaining a high reso- of ^ repon engine m ^ report 400 comprises a 

ution target. ,„„™ 45 session parser module 1420, an authentication module 1430, 

Database Update Module (270) a data query module 1440, an format output module 1450 

Once the log file processing is complete and all the log and a te late/diction mo(Me 1460 . 

lines 512 (hits) are represented in the visitor table 310' on the _ . . , , 

database buffer 250, the visitor table 310' is sorted (if In °P" at f n > a "*° rt re( 5 uest «0 received by the web 

i.- i , , „ ~ , , , , „. server 520 from an end-user is sent bv the web server 520 

multiple websites are represented). The database buffer 250 50 . .„ „ , , / " ' 

is outDUtted to the database 300 usine the database undate to the re P ort en 8 lne 400 throu g h the Common Gateway 

module 270 Interface < CGI ) 1470 of the web server 52(K 1,36 CGI 1470 

FIG. 14 is a flowchart and schematic diagram illustrating 18 a standard mechanism *™ <° allow an 

a preferred control routine for the database update module Ration jto process input and deliver content dynamically 

270. The schematic diagram below the control routine steps 55 via e we ' 

illustrates what is occurring to the data during the control ^ session parser module 1420 reads the input from the 

routine. report request 540 and sets internal variables accordingly. 

The control routine starts at step 1360, where the visitors ^ variables are then used to determine the data to use, the 

in the database buffer 250 are sorted based on their associ- re P ort t0 create ' and the format of delivery, 

ated website identification. Preferably using a quicksort 60 The authentication module 1430 verifies that the end-user 

algorithm, the records in the database buffer 250 are sorted that sent the report request has permission to view the 

into groups that belong to the same website. If only one requested report. Upon verification, the data query module 

website is represented by the log file 510, then step 1360 is 1440 queries the database 300 for the raw data needed to 

trivial. However, in the case of multiple websites, the generate the requested report. 

database buffer 250 is sorted into groups of visitors. 65 The raw data is passed to the format output module 1450, 

The control routine then continues to step 1370, where the which uses a set of templates from the template/dictionary 

database 300 is opened. Then, at step 1380, the database 300 module 1460 to format and create the report 550 to be sent 
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back to the end-user via the web server 520. The use of 1420. The control routine then continues to step 1610, where 

templates and dictionaries in the template module allows for the validation of the user is performed, 

easy customization of the reporting format. Templates can Based on configuration, step 1610 can either access 

be used to change branding and the overall look and feel of internal configuration parameters, listing users and reports, 

the report interface. Dictionaries in the template/dictionary 5 or [ t can access an external source (not shown) for user 

module 1460 can be used to change the report language on validation. If the user is validated for the report request, then 

the fly. The end-user can toggle which dictionary is used for control continues to step 1630, where the report request is 

reporting directly through the CGI interface 1470. passed to the data query module 1440. If the validation fails, 

The access and delivery of reports is preferably controlled control jumps to step 1640, where an error response is 

using a Javascript application, which is preferably delivered io returned to the user, 

to the end-user upon the first report request 540. The Data Query Module (1440) 

Javascript Application provides the mechanisms for display- FIG. 19 is a flowchart of a preferred control routine for the 

ing report content and querying for new reports. data query module 1440. This data query module 1440 

The operation of each of the modules in the report engine accesses the methods module 1410 in the database 300 in 

400 will now be explained in more detail. 15 order to receive a report-ready raw data set. 

Session Parser Module (1420) The control routine starts at step 1650, where the identi- 

The session parser module 1420 is used to read and access fication of the requested report and other parameters parsed 

data specific to the type of request being made. Furthermore, previously by the session parser module 1420 are formatted 

hosting operations are creating control panel interfaces with into a query that can be passed to the database 300. The 

which customers can login and access all of their tools and 20 format of the query is based on the specification of the 

applications from one web-based location. Customers login methods module 1410 in the database 300. Typically, SQL 

once into the control panel, and then have access to e-mail, type queries are created at step 1650. 

website builder tools, newsgroups, etc. Next, at step 1660, the query generated at step 1650 is sent 

In order to integrate the present system 100 into custom to the database 300. Then, at step 1670, the data from the 

control panel interfaces, the session parser module 1420 is 25 database 300 is received and stored in a buffer. The buffer 

a flexible session sensitive system that allows the present now contains the raw unformatted data for the requested 

system 100 to work seamlessly with the user's control panel. report. Control then continues to step 1680, where the data 

FIG. 17 is a flowchart and schematic diagram of a received and stored in the buffer is passed to the format 

preferred control routine for the session parser module of output module. 

FIG. 16. User requests for reports are generated and passed 30 Format Output Module (1450) 

to the report engine 400 from the web server 520. Since the FIG. 20 is a flowchart of a preferred control routine for the 

system 100 only contains one report engine 400, parameters format output module 1450. The control routine starts at step 

1500 are passed to the session parser module 1420 within the 1690, where templates and dictionaries are obtained from 

report engine 400 in order to determine which report to the template/dictionary module 1460. The templates and 

generate. The passing of parameters 1500 is built into the 35 dictionaries are chosen based on the type of report and 

navigation of the reporting interface, i.e., as the end-user language desired. 

clicks through the navigation menus within the interface and Control then continues to step 1700, where the requested 

selects a report, the proper parameters 1500 are automati- report is formatted by merging the data stored in the buffer 

cally sent to the session parser module 1420. by the data query module 1440 with the chosen templates 

The parameters 1500 preferably contain three parts. The 40 and dictionaries. Variables are replaced with values, and 

session-id 1510 is used to keep track of which user is logged words are replaced with dictionary entries. The result is a 

into the system. The application data 1520 contains the web-based report ready for delivery custom created for each 

report-specific parameters used to select the correct report. user. The report is delivered to the user at step 1710. 

The user session info is an optional set of parameters that Javascript System 

can be used to integrate the system 100 into a user control 45 The report engine 400 preferably uses a Javascript system 

panel containing multiple applications. comprising a special combination of HTML and Javascript 

The control routine 1420 begins at step with the read input to produce interactive reports that are extremely efficient and 

step 1540, which parses the list of parameters 1500 and easy to use. The basic concept is that the Javascript, which 

separates the data into "name-value pairs." Control then is loaded into the user's web browser contains the code 

passes to the identify variables step 1550, which uses a 50 necessary to create the visual reports. Once loaded, the web 

pre -determined configuration 1560 to match the external server 520 only needs to deliver data to the web browser, 

name-value pairs with internal variables. This allows the which is then rendered on the user side of the Javascript 

system 100 to recognize custom variables being used by system. 

proprietary control panels and other user interface mecha- The benefits of Javascript system are less connections to 

nisms. 55 the web server 520. The user can experience real-time 

Authentication Module (1430) navigation, as many of the controls do not require new 

FIG. 18 is a flowchart of a preferred control routine for the connections to the web server 520. Opening mentis and 

authentication module 1430. After the specific variables of sorting data occur directly in the web browser. Used in 

the report request and session are determined, the authenti- conjunction with the CGI Reporting technology described 

cation module 1430 provides a flexible way to check access 60 previously, the Javascript system is extremely efficient and 

authorization for report requesters. While the authentication scalable for even the most crowded web server communities, 

module 1430 may user either built in functionality or access FIG. 21 is a schematic diagram of a preferred embodiment 

pre-existing user databases, the basic steps of the control of the Javascript system. The system comprises an end-user 

routine are the same. web browser side 1810 and a server side 1820. 

The control routine starts at step 1600, where the identity 65 When the end-user first accesses the report engine 400, 

of the user, the website and the report being requested are the report request is sent to the web server 520 which returns 

determined based on data from the session parser module the frameset/application 1830 and icons 1840. A Javascript 
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application 1850 resides hidden in the parent frameset 1860. 
The Javascript application 1850 then draws the two frames: 
the navigation frame 1870 and the report frame 1880. The 
navigation frame 1870 is drawn directly from the Javascript 
application 1850. 5 

As the end-user wants to see a different attribute of the 
report or data, they can click on navigational and control 
elements in either the navigation frame 1870 or the report 
frame 1880. These control elements affect variables in the 
code of the frameset 1860, which then redraws the necessary 10 
subframes. If the end-user has selected something that 
requires a new data set, only the data is requested and 
delivered from the web server 520 through the report engine 
400. The Javascript application 1850 loads the new data 
1890, and draws the subframes and reports accordingly. 15 

Real-Time Reporting 

The demand for real-time reporting comes from many 
sources. In today's fast-paced economy, marketing and 
advertising managers wish to make rapid decisions and have 20 
immediate access to data as it occurs. Likewise, webmasters 
and system administrators, who are charged with managing 
critical website systems and servers, need real-time moni- 
toring tools in order to keep a finger on the pulse of their 
systems. The ability to monitor activity in real-time gives the 25 
system administrators the ability to react to problems and 
potential attacks. Likewise, managers can monitor market- 
ing strategies and ad campaign effectiveness as they are 
released. 

As described previously, the system 100, using the live 30 
data access control routine shown in FIG. 5, has the ability 
to record web traffic into the database 300 continuously as 
it occurs. Since, as describe above, the report engine 400 
creates reports when they are requested, all reports can 
display up-to-date real-time information. In addition to 35 
general demographic and statistical reports, the system 100 
is preferably configured to create a series of reports that are 
specifically designed to take advantage of real-time data. 
Visitor Monitor 

FIG. 22 illustrates an example of a visitor monitor report 40 
1900 created by the system 100 of the present invention. The 
report 1900 preferably uses custom templates specifically 
designed for real-time reporting. The report 1900 is a 
web-based interface that provides a "live" real-time look at 
one of several possible data parameters 1910, such as 45 
visitors, pages, hits, bytes and dollars. The report preferably 
includes a visitor monitor graph 1920 that is preferably 
refreshed approximately every second to reflect new data. 
The data in the visitor monitor graph 1920 preferably moves 
from right to left as time progresses. The current time 1930 50 
is preferably indicated above the visitor monitor graph 1920. 
In addition to the graphical display, the report 1900 prefer- 
ably displays the current value 1940 of the data parameter 
1910 currently being displayed, as well as the parameter's 
average value for that day 1950. 55 

By monitoring the visitor data parameter 1910, the current 
traffic level can be monitored as it occurs. Controls 1960 are 
preferably provided that are configured so that the user can 
look at previous data, stop and freeze the graph, or continue 
with current data. 60 

A small amount of Javascript is preferably used to control 
the refreshing of the visitor monitor report 1900. In addition, 
the visitor monitor report 1900 preferably uses a small 
amount of Javascript to time and reload the image 1970. The 
image 1970 is generated by the report engine 400, and uses 65 
the PNG format for compact lightweight operation. Since 
only the image 1970 is reloaded approximately every 



second, the visitor monitor report 1900 does not flicker when 
viewed with most browsers, thus creating an animated 
appearance to the graph 1920. 
Temporal Visitor Drill Down 

The images 1970 loaded into the visitor monitor report 
1900 preferably include an HTML/javascript image map 
that provides "clickable" drill-down access to detailed infor- 
mation within the visitor monitor graph 1920. The visitor 
monitor report 1900 preferably contains a series of invisible 
rectangles (not shown) which cover the surface of the visitor 
monitor graph 1920. When the end-user clicks within the 
visitor monitor graph 1920, within one of the rectangles, that 
rectangle is mapped to a specific point in time. This time 
information is then compiled into a URL query and sent to 
the server to provide information on that specific point in 
time. 

FIG. 23 is an example of a temporal visitor drill down 
report 2000 created by the system 100 of the present 
invention, for displaying the time-specific data discussed 
above. All visitors 2010 that were currently active on the 
website at the selected time are listed by IP address and 
sorted based on the number of hits 2020. Bytes 2030, 
pageviews 2040, and length of visit 2050 are also preferably 
shown for each visitor 2010. The totals 2060 of bytes 2030, 
pageviews 2040, hits 2020 and length of visit 2050 for all 
visitors are also preferably displayed at the bottom of each 
column. 

Administrators can use this drill down capability to 
quickly assess which visitors 2010 are responsible for the 
corresponding web server traffic. Hostile attacks from robots 
and web spiders can also be monitored in real-time. Admin- 
istrators can take action against hostile clients by blocking 
their access to the servers. 
Visitor Footprint 

In addition to monitoring web server usage, the drill down 
capability described above is taken one step further. Each 
visitor 2010 listed in the Temporal Visitor Drill Down report 
2000 is preferably selectable and linked to provide a visitor 
footprint on that specific visitor. All of the views are 
web-based and linking is preferably accomplished using 
simple HTML and Javascript. When the user selects a link 
on their browser, a new browser window opens and queries 
the report engine 400 for the specific information on that 
visitor. 

FIG. 24 illustrates an example of a visitor footprint report 
2100 created by the system 100 of the present invention. The 
visitor footprint report 2100 preferably contains detailed 
information on the activity of the selected visitor, including 
traffic information 2110, browser information 2120, referral 
information 2130, domain information 2140 and the visitor 
path 2150 (the specific path the visitor took through the web 
site). 

If the visitor shown in the visitor footprint report 2100 is 
responsible for an e-commerce transaction that is processed 
by the system 100, then additional e-commerce information 
2160 is preferably shown in the visitor footprint report 2100. 
If the visitor shown in the visitor footprint report 2100 
looked at multimedia clips that are captured by the system 
100, then additional streaming information 2170 is prefer- 
ably shown in the visitor footprint report. 

The browser information 2120 is preferably analyzed to 
see if it matches a known browser or platform. If the browser 
is recognized then an icon of the browser and platform 2180 
can be optionally shown as part of the browser information 
2120. If the visitor is identified as a robot, then an icon of a 
robot (not shown) can be optionally shown as part of the 
browser information 2120. This can be useful for quickly 
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identifying hostile attacks from aggressive robots and spi- 
ders which can flood the web servers 500 with requests, 
creating a slow down in response times. 

The visitor footprint report 2100 can provide insight into 
the usage of the website as well as help analyze specific 
visitors. While the detailed activity of the visitor can be 
monitored, the system 100 preferably does not record, use, 
or display any personal or identification information such as 
e-mail addresses, names, etc. Each visitor, while specific in 
the database 300, preferably remains anonymous. 
System Meter 

FIG. 25 illustrates an example of a system meter report 
2200 created by the system 100 of the present invention. The 
system meter report 2200 is similar to the web-based visitor 
monitor report 1900 shown in FIG. 22. However, instead of 
providing a full-sized analysis tool, the system meter report 
2200 is designed to be small enough to fit on a desktop 
computer screen at all times. 

The system meter report 2200 contains multiple thumb- 
nail sized report images (2210, 2220, 2230, 2240, 2250) that 
all refresh in the same manner as the visitor monitor report 
1900. To access the system meter report 2200, the end-user 
preferably selects a collapse button 1980 (shown in FIG. 22) 
or a "system meter" navigation button (not shown) within 
the visitor monitor report 1900. When the system meter 
report 2200 is requested from the visitor monitor report 
1900, the window containing the visitor monitor report 1900 
preferably closes and a new smaller window appears on the 
desktop computer screen containing the system meter report 
2200. 

The system meter report 2200 is preferably configured so 
that a user can resize the system meter report 2200 (with, for 
example, a computer mouse) creating a compact live web- 
meter that gives them constant monitoring of critical sys- 
tems. The system meter report 2200 is also preferably 
configured so that selecting one of the report images (2210, 
2220, 2230, 2240, 2250) re-opens the full-sized visitor 
monitor report 1900. 

The system meter report 2200 preferably displays graphs 
of visitors 2210, hits 2220, pages 2230, bytes sent 2240, and 
money 2250 (if e-commerce is activated). 

E-Commerce Reporting 

As businesses move from providing passive information 
about their products to providing interactive shopping 
capabilities, successful analysis of internet traffic can pro- 
vide valuable information for making strategic business 
decisions. 

In one preferred embodiment of the present invention, 
Return On Investment Reporting (ROIR) technology is used 
to provide the ability to report on internet traffic in terms of 
revenue. All aspects of the visitor reporting are correlated to 
dollars spent on the website, providing detailed analysis of 
when and where revenue is generated. Marketing and adver- 
tising managers can use this information to track the effec- 
tiveness of banner ads, the location of and behavior of 
shoppers and more. 

The key to this technology is the present invention's 
ability to correlate data in a Visitor-Centric way. The Visitor- 
Centric configuration of the present invention allows the 
system 100 to report on dollars spent in correlation with any 
visitor parameter. 

E-commerce websites use shopping cart software 
(hereinafter "shopping carts") to provide a secure method 
for on-line ordering. Shopping carts allow the end-user to 
add products to their virtual shopping basket, change quan- 
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tities and check out, similar to a normal shopping experi- 
ence. There are many commercial shopping cart products 
such as Miva's Merchant™ and Mercantec's Softcart™. 

Whether an e-commerce site uses an off-the-shelf product 
or a custom engineered application, the concept is the same. 
The shopping cart software keeps track of each visitor 
shopping session. As products are added to an individual's 
shopping cart, the software updates the visitor's specific 
information. When the visitor decides to check out and 
purchase the products, the shopping cart provides the nec- 
essary shipping and billing forms and can process the 
transaction. 

E-commerce Lop File Format 

The internet traffic monitoring and analysis system and 
method of the present invention utilizes the e-commerce log 
files 580 produced by the shopping carts to perform the 
e-commerce data correlation. However, the log file formats 
used by different shopping carts can vary. A preferred 
e-commerce log file format for use with the internet traffic 
monitoring and analysis system and method of the present 
invention is described below. 

The e-commerce log file format is preferably a tab- 
separated, multiline format. The transaction preferably 
begins with the exclamation mark (!) character (which is 
thusly prohibited from the rest of the data). The first line of 
the e-commerce log file preferably contains the geographic 
and overall information on the e-commerce transaction. 
Subsequent fines preferably contain details on individual 
products. The preferred basic format of the e-commerce log 
file 580 is as follows: 
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Itransfieldl 


transfield2 . . . 


productfieldl 


productfield2 . . . 


productfieldl 


productfield2 . . . 


transfield 


transfield2 . . . 


etc. 





Blank fields preferably contain a dash (-) character. The 
preferred format for the transaction line is as follows: 
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\!%{ORDERID}%h%{STORE}%{SESSIONID}%t%{TOTAL}% 
{TAX}%{SfflPPING}%{BILL_CrTy}% 
{BILL_STATE}%{BILL„ZIP}% 
{ B ILL_CNTRY} 

where %{ORDERID ! is the order number. 

is the remote host (see apache.org). 
is the name/id of the storefront, 
is the unique session identifier of 
the customer. 

is time in the common log format 
is the transaction total including 
tax and shipping, 
(decimal only, no "$" signs), 
is the amount of tax charged 
to the subtotal. 

is the amount of shipping charges, 
is the billing city of the customer, 
is the billing state of the customer, 
is the billing zip of the customer, 
is the billing country of the 
customer 



%{ORDERID} 
%h 

%{STORE} 
%{SESSIONID} 

%t 

% {TOTAL} 



%{TAX} 

%{SHIPP1NG} 

%{BILL_CITY} 

%{BILL_STATE} 

%{BILL„ZIP} 

%{BILL_CNTRY} 



US 6,792,458 Bl 



25 

The preferred format for the product line is: 
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%{ORDERID}%{PRODUCTCODE}%{PRODUCTNAME}% 
{VARIAnON}%{PRICE}%{QUANTTTY}%{UPSOLD} 



where %{ORDERID} 

%{PRODUCTCODE} 
%{PRODUCTNAME} 
%{VARIATION} 



%{PRICE} 

%{QUANTTTY} 
%{UPSOLD) 



is the order number, 
is the identifier of the product, 
is the name of the product, 
is an optional variation of 
the product for colors, 
sizes, etc. 

is the unit price of the product 

(decimal only, 

no "$" signs). 

is the quantity ordered of 

the product. 

is a boolean (l|0) if the 
product was on sale. 



An aspect of the present invention is the optional provi- 
sion of a plug-in module for existing shopping carts that will 
allow the shopping cart to create the e-commerce file log 580 
in the preferred format. 
E-commerce Visitor Correlation 

In order to provide the ROIR reporting described above, 
the system 100 performs a special correlation between the 
e-commerce transaction data in the e-commerce log file 580 
and normal website visitor traffic data in the standard log 
files 510. 

As discussed above, both the standard log files 510 and 
the e-commerce log files 580 are processed by the log engine 
200. As discussed above in connection with FIGS. 3-9, each 
line of the log files 510 and 580 is processed and passes 
through the following steps. (1) the log line 512 of the log 
file 510 or 580 is read into the database buffer 250; depend- 
ing on the format of the log file, the log line 512 is processed 
and identified; (3) the website identification module is used 
if multiple websites are logged into the same log file 510 or 
580; (4) the visitor identification module uses the IP number 
and a timestamp found in the log line 512 (or session id) to 
establish the unique identity of the visitor; (5) the visitor ID 
is used to determine the record number in the visitor table 
310'; and (6) the record is updated with the information from 
the log line 512. 

FIG. 26 shows the visitor table 310' in the database buffer 
250. As discussed above, the visitor table 310' may include 
many fields, such as Hits 3000, Bytes 3010, Pages 3020, 
Dollars 3030, Referrals 3040, Domain 3050, Browser 3060, 
etc. The visitor table 310' is where the e-commerce corre- 
lation is done. 

The e-commerce log file 580 will update the visitor's 
Dollars field 3030, which indicates money spent by the 
visitor. The remaining fields are updated using the standard 
log file 510. The Dollars field 3030 is used to determine 
money spent on the website in terms of the other fields 
(parameters). 

For example, the Referral field 3040 in the visitor table 
310 1 holds a record number to an entry in the referral data 
table 3070. The referral in the referral data table 3070 
indicates how the visitor found the website. For example, if 
the visitor came from the yahoo.com™ website, then the 
referral field 3040 in the visitor table 310* would hold the 
record number pertaining to the yahoo.com™ entry in the 
referral data table 3070. All visitors that came from yahoo- 
.com™ would have the same referral record number in the 
referral field 3040. Similarly, the Domain and browser fields 
3050 and 3060 in the visitor table 310' would hold record 
numbers to entries in the domain data table 3080 and 
browser data table 3090. The other fields 3000, 3010 and 
3020 would likewise have data tables associated with them 
(not shown). 
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By looping over the visitor table 310', a money amount 
can be associated with each entry in any of the data tables. 
If, for example, a money amount is associated with each 
entry in the referral data table 3070, all shoppers that came 
from yahoo.com™ (as an example) would be aggregated to 
produce a return-on-investment indicator. 

FIG. 27 shows an example of an ROIR e-commerce report 
generated by the system 100 of the present invention. The 
report 3100 uses the domain data table 3080, shown in FIG. 
22, to produce a top- 10 report of Internet Domains whose 
visitors spent the most money on the website represented by 
the report 3100. In the example report 3100, Aol.com™ is 
the top domain in terms of money, spending approximately 
46% of all money spent on the website. 

The total money spent by all the visitors for each domain 
is displayed when the "Dollars" tab 3110 is selected. The 
average amount of money spent by each visitor at each 
domain can also be displayed selecting the "Dollars/Visitor" 
tab 3120. The average amount of money spent by each 
visitor is calculated by dividing the total amount of money 
spent at each domain by the number of visitors to the 
domain. 

E-commerce website owners can use these correlations to 
make valuable business decisions. The system and method 
of the present invention can correlate money to keywords, 
banner ads, search engines, referrals, domains, countries, 
browsers, platforms, or any other parameter of interest. The 
website operators can monitor the performance of search 
engine registrations, banner ad placements, regional ad 
campaigns, and more. 

User Interfaces/System Reports 

Examples of preferred user interfaces and system reports 
will know be discussed. All reports and interfaces are 
preferably web-based and viewed with a web browser. 
While not all possible reports are shown, the reports shown 
are representative of the types of reports and report con- 
figurations that are possible with the system and method of 
the present invention. Accordingly, it should be appreciated 
that the configuration and types of reports, as well as the 
configuration and types of user interfaces may vary from 
those shown while still falling within the scope of the 
present invention. 

Further the user interfaces described below are for gen- 
eration of static reports. The user interfaces used for real- 
time reports were described above in connection with FIGS. 
22-25. 

FIG. 28 shows a preferred browser-based user interface 
4000. This is preferably the first user interface 4000 shown 
when the user first accesses the reporting interface of the 
system 100. The user interface 4000, preferably contains 
areas 4020 and 4030 for displaying product and/or company 
logos. The user interface 4000 also includes a main reporting 
window 4100 for displaying a currently chosen report. 

The user interface 4000 preferably includes a navigation 
area 4040 that contains a collection of menus that group the 
available reports into different categories, preferably seven 
main categories, each with an associated link 4050: Traffic; 
Pages; Referrals; Domains; Browsers; Tracking; and 
E-Commerce. A collection of finks to specific reports 4060 
related to a chosen category link 4050 is preferably dis- 
played under a chosen category link 4050. The currently 
chosen report link 4070 is preferably indicated by a change 
in color or shading. In the example shown in FIG. 28, the 
currently chosen report link 4070 corresponds to the "Snap- 
shot" report. 
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The user interface 4000 preferably includes a "date range" Forms. The File Types report is a top-ten type report that 

functions area 4080. Depending on the report chosen, this indicates which file extensions or types are accessed the 

date range functions area 4080 allows the user to select the most. This allows the user to distinguish between HTML 

date range of the report being shown. The user interface also pa ge, GIF images, etc. The Status/Errors report is a tree-type 

preferably includes a controls area 4090 that preferably 5 report that indicates status codes and error messages that 

includes preferences and report exporting features. The occur during web content deHvery. The Posted Forms report 

preferences function of the controls area 4090 allows the ^ a lop _ ten type report that indicates the forms that were 

user to change report settings, such as.the language that is submitted using the P0S T method as defined in the HTTP 

used for display. The exporting function or the controls area nrntocol 

4090 allows the user to export the currently viewed data for „ £ f i ' n 1 * a r> 

,. v u Ayr . n. -pi 1TM io Referrals Related Reports 

use in other applications, such as Microsoft Excel™. ~ , r . ~ 0 , ^ . 4eiU . 

• _r .Jn , £ <i ■ i j tti FIG. 32 shows an example of a Search Engine report 4500 

The user interface 4000 ako preferably includes a Help from ^ Reforra]s menu ^ Qf ^ navi * ion J ea 4040 

Information area 4130, which gives a brief synopsis of the „ - , ,_ 1A * i * j. i_ »l 

report being displayed and provides a link 4135 for more Referrals menu 4510 provides reports related to how the 

in-depth information. vlsltor found a website ' 

Traffic Related Reports 35 ^ ne Search Engines report 4500 contains a tree-type list 

The Snapshot report 4010 shown in FIG. 28 is preferably of the most used search en S ines - Each search en S ine can 

a bar graph 4110 of the last 7 days of web site traffic in terms then be expanded to see which keywords were used during 

of various fields, preferably Visitors, Pageviews, Hits, or those searches. 

Bytes. There are preferably tab controls 4120 on the report Additional reports in the Referrals menu 4510 preferably 

4010 that allow the user to select which field is displayed. 20 include Top Referrals, Top Keywords, and the Referral Tree. 

The date of each day is preferably shown below the bars in The Top Referrals reports is a simple top-ten type list of the 

the graph 4110. top referring URLs. The Keywords report indicates the top 

FIG. 29 shows an example of an Hourly Graph report keywords used across all search engines. The Referral Tree 

4200. The Hourly Graph report preferably shows traffic report breaks down the Referral URLs by domain, 

versus hour of the day in terms of various fields, preferably 25 Domains Related Report 

Visitors, Pageviews, Hits, or Bytes. There are preferably tab FIG. 33 is an example of a Top Domains report 4600, 

controls 4120 on the report 4200 that allow the user to select which indicates regional and network information about the 

which field is displayed. visitors. The visitor's domain is determined by the IP 

The Hourly Graph report 4200 is preferably a bar graph address of the visitor. The domain is resolved using the 

indicating the 24 hours of the day from left to right. This 30 Reverse DNS module 260 within the log engine 200 

report allows administrators to see when peak activity is described previously. 

expected and when to plan site maintenance and upgrades. Additional reports under the Domains menu 4610 in the 

Other reports available under the Traffic category prefer- navigation area 4040 preferably include Domain Tree and 

ably include the Summary, Daily Graph, Monthly Graph and Top Countries. The Domain Tree report provides the dififer- 

Top Servers reports. The Summary report gives a text based 35 ent levels of domains. Primary domains such as .com and 

summary of overall traffic to the site. The Daily Graph is .edu are shown first. Preferably, these can be expanded to 

similar to the Hourly Graph report 4200, except that the show detailed information within. The Top Countries report 

traffic is displayed as a function of the day of the month. The expands and analyzes which countries people are coming 

Monthly Graph report provides traffic displayed versus from, 

month of the year, and the Top Servers report indicates 40 Browsers Related Reports 

which log files or servers are responsible for the most traffic FIG. 34 shows an example of a Browser Tree report 4700, 

in the cluster. which is a tree-type report that ranks the most widely used 

Pages Related Reports browsers by visitor to the website. Browsers such as Internet 

FIG. 30 shows an example of a Top Pages report 4300. Explorer™ and Netscape™ are reported upon as a whole 

The Top Pages report 4300 is one of the reports listed under 45 and by version. Each primary browser can be expanded to 

the Pages menu 4310. The Top Pages report 4300 preferably see the breakdown by version. 

indicates a top-ten type list, ranking which pages in the Additional reports in the Browsers menu 4710 of the 

website are the most visited. The tabs 4120 are preferably navigation area 4040 preferably include Platform Tree and 

used to view the report 4300 in terms of either Pageviews or Top Combos. The Platform Tree report indicates the oper- 

Bytes transferred. Next and previous buttons 4320 are 50 ating system of the visitor. It is a tree-type report that can be 

preferably provided that allow the user to scroll through the expanded to show the versions under each platform. The Top 

Top Pages Report 4300. The number of entries shown are Combos report ranks the correlation between browser and 

preferably adjusted with the #Shown menu 4330. platform. 

FIG. 31 shows an example of a Directory Tree Report Tracking Related Reports 

4400. The Directory Tree Report 4400 is similar to the top 55 FIG. 35 shows an example of a Top Entrances report 

pages report 4300 of FIG. 30, except that the Directory Tree 4800. As part of the Tracking menu 4810 within the navi- 

Report 4400 preferably includes links 4410 next to each gation area 4040, the Top Entrances report 4800 indicates 

entry that can be selected to open information below that the starting point of visitors in the website. Additional 

entry. This allows for easy display and navigation of hier- reports in the Tracking section 4810 preferably include Top 

archical type data, such as a directory structure. 60 Exits, Click Through, Depth of Visit, Length of Visit, and 

The directory tree report 4400 indicates which directories Usernames. 

within the website architecture are being accessed the most. The Top Exits report provides a list of the last page 

Under each directory, the end user can drill down to see the visitors looked at before leaving the site. The Click Through 

subdirectories or individual pages contained within the report indicates the click percentage from any one page to 

primary directory by selecting the links 4410. 65 another. The Depth of visit report provides a histogram of 

Other pages-related reports in the Pages menu 4310 the number of pages viewed by visitors. The Length of Visit 

preferably include File Types, Status/Errors, and Posted report provides a histogram of the time spent on the site by 
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visitors. The Useraames report analyzes the usage of pass- be configured to work with either of these architectures. If 

word protected areas of a website by listing the usernames each website has its own unique log file, then the log files are 

that were used to login to the those sections. preferably entered into the system's 100 configuration, so 

E-Commerce Related Reports that each website has its own area in the configuration. The 

FIG. 36 shows an example of a Top Products report 4900, 5 system 100 will process the logs one at a time treating each 

which is part of the E-Commerce menu 4910 in the navi- website independently. 

gation area 4040. The Top Products report 4900 indicates the If the web server 500 is configured to log centrally, then 

Top Products purchased from the site by revenue. Additional the log file (510, 580) preferably contains some website 

reports in the E-Commerce menu 4910 preferably include identification marker in order for the system 100 to be able 

Totals, Product Tree, Regions, and Top Stores. The Totals 10 to sort and process the log file 510. As described previously, 

report gives a summary of overall e-commerce activity. The the website identification module 220 is designed to capture 

Product Tree report groups products by category. The some parameter within the log file, in order to determine 

Regions report indicates the regional location of shoppers which hits go with which websites. This type of integration 

including cities, states and countries. If multiple store fronts can automatically detect new websites as they are added to 

are used by the same shopping system, the Top Stores report 15 the web server 500 without modifying the configuration of 

can breakdown revenue by storefront. the system 100. 

Single Log vs. Multi-Log 

System Integration The system and method of the present invention can be 

The system and method of the present invention can be configured for systems that reside on one web server 500 or 

configured in many different ways. From single server 20 on multiple web servers 500. Multiple web servers 500 are 

configurations to complex load balancing systems, the sys- often used for load-balancing, redundancy, and functional 

tem and method of the present invention is flexible in its serving. Multiple web servers 500 will each have their own 

integration abilities. While it is difficult to catalog every set of logs 510. The system and method of the present 

possible architecture, several possible configurations are invention can automatically correlate the visitor centric data 

described below. 25 from multiple logs (510, 580), as described previously. By 

Webserver vs. Dedicated Server simply entering the multiple logs in the configuration, the 

The system and method of the present invention can be system 100 will process the multiple logs, 

implemented directly on the web server 500 that produces E-Commerce vs. No-Commerce 

the log files (510, 580), or on a separate dedicated computer. As described previously, the system and method of the 

If the system 100 is implemented directly on the web server 30 present invention can include e-commerce reporting 

500, it can then use the web server 500 for the reporting web functionality, and can be used in conjunction with shopping 

server 520. If the system 100 is implemented on a dedicated cart software. The e-commerce log files 580 are handled 

box, then a web server 520 will need to be configured on the similarly to the multi-log architecture discussed above. The 

dedicated computer in order to service the report requests. e-commerce logs 580 are simply treated as multiple logs. 

Access to log files is slightly more complicated on a 35 Additional entries will need to be made in the configuration, 

dedicated computer. If the system 100 is implemented on a For integration into e-commerce systems, the shopping 

dedicated computer, then the log files (510, 580) from the cart software is preferably configured to create the preferred 

web server 500 will need to be accessible to the dedicated log file format described above, 

computer by using FTP, NFS, or some other suitable disk Control Panel vs. Stand-Alone 

access method. Real-time processing of log files requires 40 Many larger hosting providers are creating centralized 

writing permission to the log files (510, 580) which may web-based control panels that contain links to all of the tools 

require an extra configuration step if using a dedicated and systems available to the hosting clients. Hosting clients 

computer. log into the control panel once and are provided with 

As long as the log files (510, 580) are accessible (with customized information and interaction, such as accessing 

permissions) and a web server is available, the system 100 45 their unique e-mail account, uploading files to their unique 

can work just as well directly on the web server 500 or on website, and viewing the reports created by the system of the 

a dedicated computer. present invention. 

One Website vs. Multiple Websites Stand-alone systems will have unique reporting directo- 

The system and method of the present invention can ries for each website. Thus, accessing the reporting area is 

handle multiple websites. During integration, a unique 50 simple, as each reporting area will have a unique URL. 

reporting directory for data storage can be configured for Protecting report access can be accomplished through the 

each of the websites. The system 100 will link the individual web server 520 itself, and does not require integration with 

report directories back to the main installation, so that there the system 100. 

is only one copy of the templates and icons. Users will need For control panel integrations, the system and method of 

internet access to the reporting directories. Thus, the web 55 the present invention is preferably sensitive to session 

server 520 configuration should be similar to the system 100 controlling technology. As described previously, the session 

configuration. A typical installation will use a subdirectory parser module 1420 has the ability to detect custom vari- 

within each website's document root to store and access the ables and control report delivery from a central location, 

reports. The various components of the present invention are 

Whether there is one website or many, the integration 60 preferably implemented on internet (e.g., web) servers, 

preferably provides a unique web accessible directory for which may be or include, for instance, a work station 

each website configuration. running the Microsoft Windows™ NT™, Windows™ 2000, 

Distributed Logs vs. Central Logs UNIX, LINUX, XENIX, IBM, AIX, Hewlett-Packard 

Web servers 500 can be configured to create unique log UX™, Novel™, Sun Micro Systems Solaris™, OS/2™, 

files (510, 580) for each website in the web server's 65 BeOS™, Mach, Apache Open Step™, or other operating 

configuration, or a single log file (510, 580) for all websites system or platform. However, the various components of the 

in the configuration. The system of the present invention can present invention could also be implemented on a pro- 
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grammed general purpose computer, a special purpose intervals to check for new log file data, and commences 

computer, a programmed microprocessor or microcontroller processing of any new log file data at a most recent 

and peripheral integrated circuit elements, an ASIC or other determined "end of file" location. 

integrated circuit, a hardwired electronic or logic circuit 7. The system of claim 6, wherein the visitor centric 

such as a discrete element circuit, a programmable logic 5 database comprises a plurality of hash tables. 

device such as a FPGA, PLD, PLA, or PAL, or the like. In 8. The system of claim 6, wherein the plurality of hash 

general, any device on which a finite state machine capable tables comprise: 

of implementing the modules and control routines discussed a visitor table that stores traffic information derived from 

above can be used to implement the present invention. t h e hits, wherein the visitor table contains a unique 

While the foregoing description includes many details and io visitor record for each visitor; and 

specificities, it is to be understood that these have been a luraH of data tobl wfaerein eacfa data ^ stQres 

included for purposes of explanation only, and are not to be data {Q a tive non „ unique parameter, 

interpreted as limitations of the present invention. Many 9 The m of daim g wherein ^ visitor { ^ 

modifications to the embodiments described above can be ^ at least one inter to at least one record stored in 

made without departing from the spirit and scope of the is at least Qne of the data lableg 

invention as is intended to be encompassed by the foUowing 10 ^ Qf ^ 8 wherein ^ tive non . 

claims and their legal equivalents. T _^ 0 _^ +o _ _ „• 

, & . ^ unique parameters comprise: 

What is claimed is: , . . , t . . . . 

1. A system for analyzing and monitoring internet traffic, doDQain names from whicfa the VISItors °nginated; 
comprising: 20 we b browsers used by the visitors; and 

a relational database; and other internet sites that referred the visitors to the at least 

a log engine that processes log files received from at least one internet site ,.„,.,, 

one internet server and stores data processed from the 11 °? cl f m u 8 ' ™ heKm the log engine 

log files in the relational database; comprises a visitor identifier that determines if a hit eng.- 

. , , . , f , 25 nates from a new visitor or an existing visitor, 

wherein the log engine, when new log file data is present u Thg ffl Qf ^ u wherdn {he visitQr identffier 

in the log file processes said new log file data and ig ad d tQ create a new yisitor record ^ a ^ ori inat6S 

determines an end or file location on the log file, and, ^. Qm a new v ^ tor 

when new log file data is not present in the log file, u Jh& ' of ^ fi wherein ^ j 

periodically checks the log file at predetermined time 3Q comprises a database buffer that temporarily stores the traffic 

intervals to check for new log file data, and commences data derived from , he ^ j d in ^ j fileg 

processing of any new log file data at a most recent u The m of ^ u wherein ^ { ^ faher 

determined end of file location on the log file. comprises a database npdater that transfers the traffic data 

2. The system of claim 1, wherein the relational database t arfl stored b the database buffer t0 the visitor centric 
comprises a plurality 01 hash tables. 35 ^ a j- a j5 ase 

3. The system of claim 1, wherein the plurality of tables 15 ^ systeffl of c]aim u> wherdn , he database up(Jater 
comprise. sortg ^ tra fg c data temporarily stored in the database buffer 

a visitor table that stores traffic information generated by before transferring the traffic data to the visitor centric 

a visitor to an internet site hosted by the at least one database. 

internet server; and 40 16. T ne system of claim 6, wherein the log engine 

a plurality of data tables, wherein each data table stores comprises a log parser that reads log lines in the log files, 

records related to a respective parameter. and separates each log file into individual fields. 

4. The system of claim 3, wherein the visitor table 17. The system of claim 6, further comprising a report 
comprises at least one pointer to at least one record stored in engine that generates reports using the traffic data stored in 
at least one of the data tables. 45 the visitor centric database. 

5. The system of claim 3, wherein the respective param- 18. The system of claim 17, wherein the report engine is 
eters comprise: adapted to generate reports that correlate money spent by a 

domain names from which the visitor originated; and visitor to any other parameter of the traffic data. 

web browsers used by the visitor; and * 9 ' ^ e svstem of claim 1 J> whereio th u e re P° i rt is 

. . ...... r j i . . 4 .t. • . 50 adapted to generate a top products report that ranks products 

other internet sites that referred the visitor to the internet « , f . .. , , . j i_ *u 

s - te purchased by visitors based on revenues generated by the 

6. A system for analyzing and monitoring internet traffic * f i • n u • *u * • 

. "j < . . 4 . A , . . . . « . j i_ 20. The system of claim 17, wherein the report engine is 

generated by visitors to at least one internet site hosted by at ad d tQ efate at ^ Qne rf a ^ oduct 

least one internet server, comprising: _ . , • , , ' 

9 * & 55 tree report, regions report, and top scores report. 

a visitor centric database; and 21. The system of claim 17, wherein the report engine is 

a log engine that receives log files from the at least one adapted to generate a report that displays a value of at least 

internet server, processes hits logged in each log file, one traffic data parameter over at least one predetermined 

and stores traffic data derived from the hits in the visitor time period. 

centric database, wherein the visitor centric database 60 22. The system of claim 21, wherein the report comprises 

associates the traffic data derived from the hits with a a snapshot report in which the at least one predetermined 

visitor that generated the hit; time period comprises seven consecutive 24 hour time 

wherein the log engine, when new log file data is present periods. 

in the log file, processes said new log file data and 23. The system of claim 21, wherein the report comprises 

determines an "end of file" location on the log file, and, 65 an hourly graph report in which the at least one predeter- 

when new log file data is not present in the log file, mined time period comprises a plurality of consecutive one 

periodically checks the log file at predetermined time hour time periods. 
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24. The system of claim 17, wherein the report engine is 
adapted to generate a top pages report that ranks website 
pages based on number of visitors to the website pages. 

25. The system of claim 24, wherein each entry in the top 
pages report comprises a link for accessing additional infor- 
mation about a respective website page. 

26. The system of claim 17, wherein the report engine is 
adapted to generate a search engine report that displays a list 
of most used search engines. 

27. The system of claim 17, wherein the report engine is 
adapted to generate a top domains report that displays 
regional and network information about the visitors. 

28. The system of claim 17, wherein the report engine is 
adapted to generate a browser tree report that ranks internet 
browsers based on which internet browsers are used most by 
visitors to a website. 

29. The system of claim 28, wherein each internet browser 
entry in the browser tree report includes a link for accessing 
information about different versions of a respective internet 
browser. 

30. The system of claim 17, wherein the report engine is 
adapted to generate a top entrances report that ranks starting 
points of visitors to a website based most used starting 
points. 

31. The system of claim 17, wherein the report engine is 
adapted to generate at least one of a summary report, a daily 
graph report, a monthly graph report, a top servers report, a 
file types report, a status/errors report, a posted forms report, 
a top referrals report, a top keywords report, a referral tree 
report, a domain tree report, a top countries report, a 
platform tree report, a top combos report, a top exits report, 
a click through report, a depth of visit report, a length of visit 
report, and a usernames report. 

32. The system of claim 17, wherein the report engine 
comprises: 

a template module that stores report templates; 

a session parser that receives report requests from the at 
least one server, and determines a type of report 
requested, data needed to generate a requested report 
and a format for the requested report; 

an authenticator that receives an identity of a report 
requester from the session parser, and verifies that the 
report requester has permission to view a requested 
report; 

a data query module that receives authentication infor- 
mation from the authenticator, and that queries the 
database for data needed to generate the requested 
report if the report requester has permission to view the 
requested report; and 

a format output module that receives the data needed to 
generate the requested report from the database, 
retrieves templates for the requested report from the 
template module, creates the requested report, and 
delivers the requested report to the report requester. 

33. The system of claim 32, wherein the template module 
also stores at least one dictionary. 

34. The system of claim 33, wherein the format output 
module is adapted to create the requested report in a select- 
able language using the at least one dictionary. 

35. The system of claim 6, wherein the log engine is 
configured to process hits from multiple internet sites that 
are logged to a single log file. 

36. The system of claim 6, wherein the log engine 
comprises a website identifier that identifies a source of each 
hit. 

37. The system of claim 6, wherein the log engine 
comprises a domain name system (DNS) resolver that 
determines host and domain information for each visitor. 
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38. The system of claim 37, wherein the DNS resolver 
utilizes reverse DNS resolution to determine the host and 
domain information for each visitor. 

39. The system of claim 6, wherein the log engine is 
adapted to process e-commerce log files that contain infor- 
mation on money spent by a visitor. 

40. An article of manufacture, comprising: 

a computer usable medium having computer readable 
program code embodied therein for analyzing and 
monitoring internet traffic generated by visitors to at 
least one internet site hosted by at least one internet 
server, the computer readable program code in the 
article of manufacture comprising: 
computer readable program code for receiving log files 

from the at least one internet server; 
computer readable program code for processing hits 

logged in each log file by: 

initiating a process loop when new data is present in 
the log file, during which the new data is pro- 
cessed and an "end of file" location is determined 
for use as a starting point for subsequent new data 
processing, and 

initiating a wait loop when new data is not present in 
the log file, wherein the wait loop delays data 
processing for a predetermined time interval 
before checking for new data in the log file; 
computer readable program code for storing traffic data 

derived from the hits in a database; and 
computer readable program code for associating the 

traffic data derived from the hits and stored in the 

database with a visitor that generated the hit. 

41. The article of manufacture of claim 40, wherein the 
database comprises a plurality of hash tables. 

42. The article of manufacture of claim 41, wherein the 
plurality of hash tables comprise: 

a visitor table that stores traffic information derived from 
the hits, wherein the visitor table contains a unique 
visitor record for each visitor; and 

a plurality of data tables, wherein each data table stores 
data related to a respective non-unique parameter. 

43. The article of manufacture of claim 42, wherein the 
visitor table comprises at least one pointer to at least one 
record stored in at least one of the data tables. 

44. The article of manufacture of claim 42, wherein the 
computer readable program code for processing hits logged 
in each log file comprises computer readable program code 
for determining if a hit originates from a visitor with a 
preexisting visitor record in the database. 

45. The article of manufacture of claim 44, wherein the 
computer readable program code for determining if a hit 
originates from a visitor with a preexisting visitor record in 
the database creates a new visitor record if a hit originates 
from a visitor without a preexisting visitor record in the 
database. 

46. The article of manufacture of claim 40, further com- 
prising computer readable program code for temporarily 
storing the traffic data derived from the hits logged in the log 
files. 

47. The article of manufacture of claim 46, wherein the 
computer readable program code for storing traffic data 
derived from the hits in a database comprises computer 
readable program code for transferring the temporarily 
stored traffic data to the database. 

48. The article of manufacture of claim 47, wherein the 
computer readable program code for transferring the tem- 
porarily stored traffic data to the database sorts the tempo- 
rarily stored traffic data before transferring the traffic data to 
the database. 
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49. The article of manufacture of claim 40, wherein the 57. The system of claim 56, wherein the website identifier 
computer readable program code for processing hits logged identifies the source of each hit from website identifier text 
in each log file further comprises computer readable pro- received from the log parser for each hit. 

gram code for reading log lines in the log files, and for 58. The system of claim 55, wherein the log engine further 

separating each log line into individual fields. 5 comprises a domain name system (DNS) resolver that 

50. The article of manufacture of claim 49, further com- determines host and domain information for each visitor to 
prising computer readable program code for generating an m ternet site 

reports using the associated traffic data stored in the data- 59 The system of djdm 58? wherein the DNS resolver is 

t*7 ^ r * i • j A ■ • ( adapted to process multiple DNS queries in parallel. 

51. The article of manufacture of claim 40, wherein the _ / n , r , _ . _ r , - „ „,? • t , -, ■ 

, , , , c • ,. A i j io 60. Ine system or claim 55, wherein the log engine is 

computer readable program code for processmg hits logged , . \ xn *u * * ♦ • r 

in each log file processes hits originating from multiple ada ? ted to P rocess ^merce lo S files that contain mfor " 

internet sites and logged to a single log file. mation on money spent by the visitor. 

52. The article of manufacture of claim 40, wherein the 61 A method of analyzing and monitoring internet traffic 
computer readable program code for processing hits logged generated by visitors to at least one mternet site hosted by at 
in each log file comprises computer readable program code 15 le *st one internet server, comprising the steps of: 

for identifying a source of each hit. receiving log files from the at least one internet server; 

53. The article of manufacture of claim 40, wherein the processing hits logged in each log file, as each hit is 
computer readable program code for processing hits logged logged to each log file, by: 

in each log file comprises computer readable program code ( a ) proce ssing new hits present in a log file, 

for determining host and domain information for each 20 (b) determining an "end of file" location on the log file, 

visitor. / c \ wa i tm g f or a predetermined time period if no new 

54. The article of manufacture of claim 53, wherein the hitg are nt in the lo m 

DNS resolver means computer readable program code tor ,,\ , , . r ... - , c , c , 

* . . , . , , v . . c c u • (o) checking for new hits in the log file after the 

determining host and domain information tor each visitor v ' re( j e ( ern jj ne( j (; me er j oc ) anc | 

utilizes reverse DNS resolution to determine the host and ->c . ? . 1 1 ? , • , £ , . 

. Z3 (e) processing any new hits discovered in the log file by 

domain information for each visitor. . „• ..i. j . ■ j j r ci i ■ 

... , c , . , .. . . . , a- startmg at the determined end ot file location m the 

5». A system tor analyzing and momtonng internet trafhc ^ 

generated by visitors to at least one internet site hosted by at 8 > 

least one internet server, comprising: stonn S> ln a database - traffic data denved from the hlts i 

a database; 30 and 

i -.1.1 • i ci c .u . i associating the trafhc data derived from the hits and stored 

a log engine that receives log files from the at least one . , f , . , . . , , , , . 

internet server, processes hits logged in each log file, "J^ debase with a viator that generated the hit. 

and stores traffic data extracted from the processed hits p 62 ' The method of claim 61, further composing the step 

in the database, wherein the log engine comprises, of g * ne . ratl " g reportS USmg the associated traffic data stored 

a database buffer that temporarily stores traffic data 35 m * e ^b 386 - _ , 

received from the database, J 63 ' J* 6 me *° u d of claim ?}> w ^ re } n th ^ ^ c u d , a,a 

a log parser that processes each hit in each log file, and de "Y e l fr ° m ^ ^ 1§ , 1 ^ h ^ f ha f sh tables - 

separates each hit into its individual fields, wherein J 64 " J h 5 meljod of claim 63 wherem traffic information 

the logparser, when new log file data is present in the derlved fromthe hlts are stor « d in a v ' sltor hash ' abl ,f that 

log file, processes said new log file data and deter- 40 contains a unique visitor record for each visitor, and wherein 

mines an "end of file" location on the log file, and, data related t0 at l ff one ^-unique parameter is stored in 

when new log file data is not present in the log file, respective data tables 

periodically checks the log file at predetermined time 65 " The met , hod of claun 64 ' wh u erein the vlsl,or b ^ h table 

periods to check for new log file data, and com- comprises at least one pointer that points to at least one 

mences processing of any new log file data at a most 45 record stored m at least one of the data tables, 

recent determined "end of file" location. 66 " ^ method of claim 64 > forther compnsing the steps 

a visitor identifier that receives each hit's individual 

fields from the log parser, identifies each hit as determining if a hit originates from a visitor with a 

originating from either a new visitor or an existing preexisting visitor record in the database; and 

visitor, and creates a new visitor record in the data- 50 creating a new visitor record if the hit originates from a 

base buffer if a hit originates from a new visitor, visitor without a preexisting visitor record in the data- 

a buffer updater that, prior to processing a new log file, base. 

copies previously stored data from the database to The method of claim 61, further comprising the step 

the database buffer, and wherein, for each hit, the of temporarily storing the traffic data derived from the hits 

buffer updater locates in the database buffer the 55 ^ a buffer prior to storing the traffic data in the database, 

visitor record identified or created by the visitor 68. The method of claim 67, further comprising the step 

identifier for a respective hit, and updates the iden- of sorting the traffic data stored in the buffer prior to storing 

tified or created visitor record in the database buffer tne traffic data in the database. 

with traffic data derived from the respective hit, and 69 - Tbe method of claim 61, wherein the step of process- 

a database updater that copies updated traffic data from 60 ^ hits lo gg ed in each log file comprises the steps of: 

the database buffer to the database after all hits in the reading log lines in the log files; and 

new log file have been processed; and separating each log fine into individual fields, 

a report engine that generates reports using the traffic 70. The method of claim 69, wherein the hits logged in 

data stored in the database. each log file are processed in real time as each hit is logged 

56. The system of claim 55, wherein the log engine further 65 to a log file, 

comprises a website identifier that identifies a source of each 71. The method of claim 61, further comprising the step 

hit. of identifying a source from which each hit originates. 
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72. The method of claim 61, further comprising the step 
of determining host and domain information for each visitor. 

73. The method of claim 72, wherein host and domain 
information for each visitor is determined using reverse 
domain name system (DNS) resolution. 

74. A method of processing a log file to obtain traffic data, 
comprising the steps of: 

copying previously stored traffic data from a database to 

a database buffer; 
separating hits logged in the log file into individual fields, 

wherein each hit is processed as it is logged to the log 

file by: 

(a) processing new hits present in the log file, 

(b) determining an "end of file" location in the log file, 

(c) waiting for a predetermined time period if no new 
hits are present in the log file, 

(d) checking for new hits in the log file after the 
predetermined time period, 

(e) processing any new hits discovered in the log file by 
starting at the end of file location in the log file, and 

(f) repeating steps (a)-(e); 

identifying each hit as originating from either a new 
visitor or an existing visitor; 
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20 



creating a new visitor record in the database buffer if a hit 

originates from a new visitor; 
for each hit, locating the visitor record identified or 

created and updating the identified or created visitor 

record in the database buffer with traffic data derived 

from the respective hit; and 
copying updated traffic data from the database buffer to 

the database after all hits in the log file have been 

processed. 

75. The method of claim 74, further comprising the step 
of generating a report based on the traffic data in the 
database. 

76. The method of claim 74, wherein hits originating from 
multiple sources are logged to the log file. 

77. The method of claim 76, further comprising the step 
of identifying a source from which each hit originates. 

78. The method of claim 74, further comprising the step 
of determining host and domain information for each visitor. 

79. The method of claim 78, wherein host and domain 
information for each visitor is determined using reverse 
domain name system (DNS) resolution. 
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WEB SITE WITH AUTOMATIC RATING 
SYSTEM 

BACKGROUND OF THE INVENTION 

A. Field of Invention 

This invention pertains to a web site host and method of 
operating the same which provides an automatic rating 
system for some of its contents. The rating system is used to 
generate a rating indicium which is sent to the content 
provider and/or to generate a payment therefor. 

B. Description of the Prior Art 

Web sites on the Internet are fast becoming the preferred 
way of providing information to readers. While in the past, 
a person looking for information had to subscribe to and read 
numerous magazines and other printed media to obtain 
certain information, much of the same information is avail- 
able now on web sites. More particularly, articles covering 
virtually every facet of the business world as well as 
information related to relaxation, personal hobbies, vaca- 
tions and similar subjects related to our private world are 
being written and published electronically so that they are 
readily available to any one in the world with a telephone 
and a PC or a TV set. A major problem that plagues the 
publishers of such information is how to get paid for the 
contents being provided. The problem has been solved by 
providing simultaneously with the information commercial 
advertisement, using banners or other advertising devices. 
However another problems that still remains is that it is 
difficult if not impossible under present conditions to deter- 
mine whether or not the articles and/or advertisements being 
provided are satisfactory to the readers. Therefore there is a 
need in the field of electronic publishing for web page 
hosting and technique which can collect data from the 
readers indicative of the perceived quality of its contents. 

OBJECTIVES AND SUMMARY OF THE 
INVENTION 

In view of the above, it is the objective of the present 
invention to provide a system for presenting a web site 
which automatically collects specific qualitative information 
regarding the contents of the web site, including information 
concerning associated advertisement. 

A further objective is to provide a host and which gen- 
erates automatically a data base accumulating and compiling 
said information in an easily readable and informative 
format. 

Other objectives and advantages of the invention shall 
become apparent from the following description. An elec- 
tronic publishing system constructed in accordance with the 
present invention is used to display data over a computer 
based distributed network, said data including at least one 
article and/or advertisement. The system includes a receiv- 
ing element receiving ratings from a reader evaluating said 
article and advertisement to generate said ratings, a data 
storage element receiving and storing information related to 
a site including at least one article and a plurality of 
advertisements, said information including said ratings, and 
a totaling element arranged to total ratings from a plurality 
of readers to generate rating indicia. The indicia may 
include, for example, including a combined article rating 
parameter for said article and/or a combined advertisement 
rating for said advertisement based on ratings from a plu- 
rality of readers. The indicia may further include data 
indicative of the number of readers who have provided said 
ratings and the percentage of readers who have rated the 
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articles or advertisements as being, for instance excellent, 
good, fair or no value. The system described above is used 
to generate cumulative rating parameters on a host site of a 
distributed network based multiple computer information 

5 distribution system, said hosting an informational page 
including a plurality of articles and advertisements, by 
presenting said articles and advertisements to a plurality of 
readers, receiving from said readers a rating associated with 
at least one of said articles and advertisements, accumulating 

10 the responses from said readers said ratings, generating a 
cumulative rating parameter for articles and/or advertise- 
ments for which responses have been received; and provid- 
ing said rating parameters to a requester together with 
associated statistical information. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 shows a block diagram of a web hosting site in 
accordance with this invention; 

FIGS. 2 show a typical web page with rating zones in 
accordance with this invention; 

FIG. 3 shows a flow chart for the operation of a system in 
accordance with this invention; 

FIGS. 4A-C shows a typical table or data base for 
25 organizing the ratings for the articles and advertisements 
associated with the host site; and 

FIG. 5 shows a flow chart for processing and delivery of 
the rating information. 

30 DETAILED DESCRIPTION OF THE 

INVENTION 

FIG. 1 shows a system 10 constructed in accordance with 
this invention which is essentially a web site server consist- 
ing of a microprocessor 12, a keyboard 14, a memory 16, a 

35 display 18 and a content input device 20. The memory 16 is 
used to save the data required to define a specific site. Each 
site is typically formed of a plurality of web pages and is 
defined using HTML or similar format. New content may be 
added through the content input device 20 which may be a 

40 floppy driver, or other similar data transfer device. The site 
is published through an ISP interface 22 in the usual manner. 

Memory 16 includes several files, each file defining a web 
page of the subject site. For example, FIG. 2 shows how a 
typical page may look to a reader accessing the site. This 

45 particular page may describe to the reader how to install a 
particular piece of hardware on a PC. The page has two 
distinct zones. The main zone or portion of the page is the 
text 100 which provides actual content or information 
required by a reader. This text may include instructions on 

50 a computer-related issue, but of course it may include any 
kind of information, such as, but not restricted to: 
Travelogues 
Recipes; 

55 Reviews of a book, play, magazine, musical selection or 
other literary criticism, 
Actual literature text. 

Also provided in zone 100 may be other types of infor- 
mation besides text, such as graphics, audio and visual 

60 information, etc. a second zone 102 is also provided which 
consists of advertisements. In fact, zone 102 may include a 
number of such advertisements, which can be spread around 
or even be embedded in zone 100, as at 102A. In other words 
the two zones 100, 102 need not be completely separated but 

65 may overlap in any fashion desired. Moreover, some of the 
advertisements such as advertisement 104B may include a 
link to a different web page. For example, if the content of 
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zone 100 is a travel-related article describing an exotic 
location, the advertisements may be for an airline providing 
service to said location. This type of advertisement, as 
discussed above, may include a hot link so that 'clicking' on 
the advertisement may connect the reader to a web site 5 
providing information about fares, flight schedules and/or 
any other information, which may or may not be related to 
the location described in the main article of zone 100. 

In accordance with this invention, a third type of zone 
104, 106 is provided which is associated either with the main 10 
text of zone 100 or one of the advertisements of zone 102. 
These zones are used to invite the reader to provide rating 
information about the contents of the associated zone. For 
instance, in the zone 104 associated with zone 100, the 
reader is invited to rate the article as being one of "excellent 15 
(E)", "good (G)", "fair (F)" or "no value (NV)." Importantly, 
another field provided in zone 104 is a "comment" field. The 
reader may select this field, and then write messages describ- 
ing his opinion. 

Similarly, rating zone 106 associated with advertisement 20 
zone 102 is provided to obtain similar information about the 
respective advertisement. The reader may also be invited to 
indicate classification (i.e., consumer or travel professional) 
in the field, as at 106A. 

Once the reader provides a rating, this rating is compiled 25 
with previous ratings by other readers in a data base stored 
in memory 16. 

FIG. 3 shows details of operation of the system. The 
system consists of several sites remote from each, intercon- 
nected by a computer network system such as the Internet. 30 
Three of the sites of the system includes a reader site, a host 
site and an advertisement site. 

Starting in step 202, a reader requests access information 
related to a subject of interest using ISP 22. At the host site, 
the request is received and, in response, information is 35 
retrieved from memory 16 (step 203) which is descriptive of 
the pages of the subject site including zones 100, 102, 104, 
106. This information is returned to the reader site where in 
responses the zones 100, 102, 104, 106 are displayed in 
standard manner (Step 204). 40 

In step 206, the reader reads the information or otherwise 
accesses it. In step 208 the reader accesses zone 104 and in 
step 210 he rates the article by selecting one of the appro- 
priate bullets of zone 104. He may also add comments by 
selecting the "comment" field as discussed above. Once the 45 
reader has indicated his rating, he can select the 'submit 5 
button and the rating is transmitted to the site host. At the site 
host, in step 212 the result of the rating is stored and an 
acknowledgment screen or message is returned to the reader. 
This acknowledgment screen or message is displayed in step 
214. 

In step 216, the reader accesses one of the advertisement 
zones 102 and reads the same. If the zone 102 includes a hot 
key which is activated by the reader, then in step 216, a 
message is sent to the advertising site requesting further 
information. In step 218 the requested additional informa- 
tion is retrieved and transmitted to the reader site. In step 220 
the additional information is displayed. 

In step 222 the reader accesses the associated rating site 
106 and rates the advertisement in a manner similar to the 60 
article of zone 100. The rating is sent to the host site where 
it is stored in step 224. The host site then returns an 
acknowledgment. In step 226 the acknowledgment is 
displayed, and in step 228 the reader signs off from the site. 

As previously mentioned, after a rating is received, when 65 
a request for a rating summary is received or at regular 
intervals, cumulative rating parameters are calculated for 
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each article and advertisement. For example, the ratings 
received may be accumulated as follows. A data base is set 
up for each article and each advertisement. The data base is 
accumulated as described in the flow chart of FIG. 5. In step 
300 of the flow chart a rating is received, as described in 
detail in the flow chart of FIG. 3. This rating may indicate 
for example that a current reader found an article (for 
instance, the 'test Article 1*) to be excellent. Therefore in 
step 302 the result of this latest rating is accumulated with 
previous result, in this case by incrementing the count in the 
cell under the Excellent column in the row for the first 
article. As part of step 302, the total number of ratings for 
this article is summed. 

Similar data bases are also generated for each of the 
advertisements. The purpose of these data bases, just like for 
the articles, is to record and summarize the responses from 
consumers and industry professionals. 

In addition, for each rating, a percentage figure is shown 
indicating how many readers have voted for an article or 
advertisement as being excellent, etc. 

Getting back to FIG. 5, in step 304 a check is performed 
to determine if a request has been received for a rating 
summary. When such a request has been received, then in 
step 306 an overall rating for each of the various articles is 
calculated if not performed before. 

For example, in FIG. 4A, the test Article 1 received an 
overall rating of GOOD from a total of 164 voters. In FIG. 
4B, the advertisement 'YOURS in Travel' received an 
overall rating "Fair" from a total of 5 voters, 3 of whom (or 
60%) were consumers, and 2 (or 40%) were Industry Pro- 
fessionals. 

The overall rating for each article or ad is calculated as 
follows. 

The value of "OR" (overall rating) is obtained by calcu- 
lating the number of entries in each rating sub-category 
(excellent, fair, good, or no value) for each article or ad. The 
percentage of the total is then calculated, then each percent- 
age is run through a set of rules that determine the rating. 

The rules basically check for any sub-category with the 
largest percentage. If one is found, that sub -category value 
is assigned to "OR." 

- If not found, "equal" status is checked between any two 
neighboring sub-categories. The value of the higher of the 
two categories is assigned to "OR." 

If not found, either "good" or "fair" is assigned to "OR", 
depending on which specific sub categories are found to be 
"equal." 

In step 308 a payment amount is generated for each 
article. This payment amount may be equal to a preselected 
base fee multiplied by a special rating parameter R of the 
corresponding article. 

For example if the ratings in the respective columns for 
the subject article are E, G, F, NV then R may be expressed 
as: 



R=W1*E+W2*G+W3*F+W4*NV 

where Wl, W2, W3 and W4 are preselected weighing 
factors. Typical values for these parameters may be 1.5, 1, 
0, and 0. 

In step 310 the data bases are sent to the requester, or 
alternatively, only the total columns and the rating param- 
eters are sent. FIGS. 4A-^C illustrate data collected and 
tabulated by rating, as well as the background (i.e., 
consumer/industry pro) of the various readers. 

In step 312 a check is performed to determine if the data 
bases are to be reset or edited. If not, normal processing 
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continues. Otherwise in step 314, a password is requested, 
(see bottom of FIG. 4C) and if a correct password is entered, 
the data and/or the format of the data bases can be edited. 

Although the communication between the requester for 
the rating data and the host site may be performed in any 
secure manner, preferably a secure, password enabled Inter- 
net access means may be used, in which case, the data bases 
can be presented in the form of web pages. 

Moreover, the various page components discussed may be 
presented to the reader in various forms. For example, 
instead of being presented co -extensively with zones 100, 
102, rating zones 104, 106 may be presented pop-up menus. 

Numerous modifications may be made to this invention 
without departing from its scope as defined in the appended 
claims. 

I claim: 

1. A web site rating system comprising: 

a host site and a reader site connected to one another via 
a computer based distributed network, said host site 
comprising: 

a transmitting element for electronically transmitting at 
least one article from said host site to said reader site; 

a receiving element for receiving rating information 
from said reader site at which a reader has evaluated 
and rated said article; 

a data storage element for storing said rating informa- 
tion; 

a processing element for processing said rating infor- 
mation in response to a request for a rating summary 
and generating an overall rating of each article based 
on the rating information received from those of the 
plurality of readers who have read an article and 
have provided rating information thereon; and 

a payment generator for generating an amount to be 
paid for the article, the amount being based on a 
preselected base fee multiplied by a special rating 
parameter. 

2. The system of claim 1 wherein each reader designates 
one of a plurality of rating levels for said article, and wherein 
said processing element generates said combined ratings 
using a preselected relationship for said rating levels. 

3. The system of claim 1, wherein said rating information 
comprises a rating chosen by the reader from a plurality of 
rating categories, and 

wherein said overall rating of each article is calculated by 
calculating the total number of readers who submitted 
ratings for each article, calculating the number of 
readers who submitted ratings in each rating category 
for each article, calculating the percentage of readers 
who submitted a rating for each rating category for 
each article, and choosing the rating category for which 
the greatest percentage of readers submitted ratings. 

4. The system of claim 1, wherein said rating information 
comprises a rating chosen by the reader from a plurality of 
rating categories (E, G, F, NV), and 

wherein said special rating parameter (R) is calculated 
using the following algorithm: 

R=W1 *E+W2*G+W3*F+W4*NV, 

where Wl, W2, W3 and W4 are preselected weighing 
factors. 

5. A method of generating an amount to be paid for an 
article stored on a host site of a distributed network based 
multiple computer information distribution system, said 
method comprising: 

transmitting said article to a plurality of readers at a reader 
site; 
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receiving from said readers ratings associated with said 
article; 

storing said ratings in a data storage element; 
processing said ratings in response to a request for a rating 
summary; 

generating an overall rating for said article for which 
ratings have been received; 

transmitting said overall rating to a requester together 
with the number of readers that provided ratings; and 

generating the amount to be paid for the article, the 
amount being based on a preselected base fee multi- 
plied by a special rating parameter. 

6. A method of generating an amount to be paid for an 
article stored on a host site of a distributed network based 
multiple computer information distribution system, said 
method comprising: 

transmitting the article from the host site to a reader site; 
receiving rating information from the reader site at which 

a reader has evaluated and rated the article; 
storing the rating information; and 

generating an amount to be paid for the article, the amount 
being based on the rating information. 

7. The method of claim 5 further comprising generating a 
spread sheet descriptive of the article on a page of said host 
site, together with corresponding rating parameters, receiv- 
ing rating information requests and transmitting in response 
to said requests said spread sheet. 

8. The method of claim 5 further comprising displaying a 
rating menu associated with said article. 

9. The method of claim 5 further comprising displaying a 
rating menu associated with said article, said rating menu 
indicating a plurality of rating levels; and selecting one of 
said levels by said reader to generate said rating. 

10. The method of claim 9 comprising providing said 
menu with a reader skill level selection choice, wherein said 
rating generated by reader includes a reader skill parameter 
related to said reader skill. 

11. The method of claim 10 further comprising generating 
a spread sheet indicative of ratings selected by readers for 
various articles. 

12. The method of claim 11 further comprising several 
spread sheets, each spread sheet being associated with a 
reader skill level. 

13. A method of generating an overall rating of an 
advertisement on a host site of a distributed network based 
multiple computer information distribution system, said 
method comprising: 

transmitting said advertisement to a plurality of readers at 
a reader site; 

receiving from said readers ratings associated with said 
advertisement together with the reader's classification 
in a particular field; 

storing said ratings in a data storage element; 

processing said ratings in response to a request for a rating 
summary; 

generating an overall rating for said advertisement for 

which ratings have been received; 
transmitting said overall rating to a requester; and 
the requester using said overall rating to determine the 

effectiveness of said advertisement. 

14. The method of claim 13, wherein said receiving step 
comprises receiving from said readers ratings associated 
with said advertisement together with information regarding 
whether the reader is a consumer or a tradesperson. 
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Appendix C; Related Proceedings 



[NONE] 
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